Happy to see "inference time compute" term being used primarily nowadays - it's a much more precise and appropriate term compared to the unwieldy "test-time compute" that openai used to call it when they thought they "invented" scaling inference time.
The linked "bitter lesson" paper by Rich Sutton is so good!
What's a point of such inference time compute if verifier is 8B model itself? Am I missing something?
Eli5?
Full blog is here: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling...
Happy to answer any questions about these methods.