The tool actually does nothing, it's just a scratch pad for the model if I understand it correctly. Fascinating that something as simple as that significantly improves performance in a variety of use-cases. Shows that there is still quite a lot of room for optimization in LLMs.
It is a hack, and SUCH a fascinating one. Some aspect of the model’s training allows this form of reasoning to stack with the others and not conflict.
Also - I love to see Anthropic poke their own product in a new way and publish what amounts to “look at this thing it does!”
We are VERY early in AI.