First Proof

by wanderingmindon 2/11/2026, 3:43 AMwith 1 comments

by matjeton 2/13/2026, 11:37 AM

At this time, the competition is soon finishing - with no models having succeeded. Given the incentives for top labs, and the short time needed for a successful automated solution, we can make a reliable upper bound on the capability of current models - better than any normal benchmaxed datasets.

What I would like to see is an easier version of this same format.