Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain. --
v0.4.4: Added adaptive 2nd-hop impact analysis to callers search — when a function has ≤10 unique callers, tilth automatically traces callers-of-callers in a single scan. First full 26-task Opus baseline (previously 5 hard tasks only). Haiku adoption improved from 42% to 78%, flipping Haiku from a cost regression to -38% $/correct.
v0.4.5: Bumped TOKEN_THRESHOLD from 3500 to 6000 estimated tokens (~24KB), so mid-sized files return full content instead of an outline that agents then read back via 5–7 sequential --section calls. Fixed two major regressions: gin_radix_tree (+35% → ~tie) and rg_search_dispatch (+90% → -26% win). Sonnet hit 100% accuracy (52/52) and -34% $/correct overall.
--
https://github.com/jahala/tilth/
Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...
-- PS: I dont have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.
All contributions are welcome, especially more benchmarks for other models!