A 6.7B model that's as good as GPT-4 is mostly due to overfitting in such a way that favors certain benchmarks.
A 6.7B model that's as good as GPT-4 is mostly due to overfitting in such a way that favors certain benchmarks.