Hacker News

by nobody9999on 5/26/2025, 7:13 AMwith 1 comments

by nobody9999on 5/26/2025, 7:14 AM

>...In this paper, we present Vending-Bench, a simulated environment designed to specifically test an LLM-based agent's ability to manage a straightforward, long-running business scenario: operating a vending machine. Agents must balance inventories, place orders, set prices, and handle daily fees - tasks that are each simple but collectively, over long horizons (>20M tokens per run) stress an LLM's capacity for sustained, coherent decision-making.

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents