[dupe]
More discussion: https://news.ycombinator.com/item?id=41046540
https://news.ycombinator.com/item?id=41046773
405B is already being served on WhatsApp!
https://ibb.co/kQ2tKX5
MMLU PRO is the benchmark I trust the most. I noticed they are using 5 shots and CoT. Is that true for GPT4 and Sonnet as well?
[dupe]
More discussion: https://news.ycombinator.com/item?id=41046540
https://news.ycombinator.com/item?id=41046773