Scaling long-running autonomous coding

by srameshcon 1/20/2026, 12:23 AMwith 110 comments

Related: Scaling long-running autonomous coding - https://news.ycombinator.com/item?id=46624541 - Jan 2026 (187 comments)

by light_hue_1on 1/20/2026, 8:41 AM

Browsers are pretty much the best case scenario for autonomous coding agents. A totally unique situation that mostly doesn't occur in the real world.

At a minimum:

1. You've got an incredibly clearly defined problem at the high level.

2. Extremely thorough tests for every part that build up in complexity.

3. Libraries, APIs, and tooling that are all compatible with one another because all of these technologies are built to work together already.

4. It's inherently a soft problem, you can make partial progress on it.

5. There's a reference implementation you can compare against.

6. You've got extremely detailed documentation and design docs.

7. It's a problem that inherently decomposes into separate components in a clear way.

8. The models are already trained not just on examples for every module, but on example browsers as a whole.

9. The done condition for this isn't a working browser, it's displaying something.

This isn't a realistic setup for anything that 99.99% of people work on. It's not even a realistic setup for what actual developers of browsers do who must implement new or fuzzy things that aren't in the specs.

Note 9. That's critical. Getting to the point where you can show simple pages is one thing. Getting to the point where you have a working production browser engine, that's not just 80% more work, it's probably considerably more than 100x more work.

by simonwon 1/20/2026, 2:56 AM

One of the big open questions for me right now concerns how library dependencies are used.

Most of the big ones are things like skia, harfbuzz, wgpu - all totally reasonable IMO.

The two that stand out for me as more notable are html5ever for parsing HTML and taffy for handling CSS grids and flexbox - that's vendored with an explanation of some minor changes here: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...

Taffy a solid library choice, but it's probably the most robust ammunition for anyone who wants to argue that this shouldn't count as a "from scratch" rendering engine.

I don't think it detracts much if at all from FastRender as an example of what an army of coding agents can help a single engineer achieve in a few weeks of work.

by tabs_or_spaceson 1/21/2026, 5:22 AM

> I think somebody will have built a full web browser mostly using AI assistance, and it won’t even be surprising

> When I made my 2029 prediction this is more-or-less the quality of result I had in mind.

There seems to be a lot of compensation and leniency made by the author here.

So, it is seemingly impressive that someone was able to use agents to build a browser.

But they used trillions of tokens? This equates to millions of dollars of spend. Are we really happy with this?

The browser itself is not fully complete. There's rendering glitches stated in the article. So millions of dollars for something that has obvious bugs.

This is also pure agent code. Can a code base like this ever be maintained by a team of humans? Are you vendor locked into a specific model if you want to build more features? How will support work? How will releases work? The lack of reflection over the rest of the software lifecycle except building is shocking.

So I'm not sure after reflecting, whether any of this is impressive outside of "someone with unlimited tokens built a browser using ai agents". It's the same class of problem being solved over and over again. Nothing new is really being done here.

Maybe it's just me but there's much more to software than just building.

by andrewchamberson 1/20/2026, 11:10 AM

Test suites just increased in value by a lot and code decreased in value.

by retinaroson 1/20/2026, 8:11 AM

Agentic coding is a card castle built on another card castle (test time compute) built on another card castle (token prediction) the mere fact that using lot of iterations and compute works maybe tells us that nothing is really elegant about the things we craft.

by halfcaton 1/20/2026, 4:05 AM

So AI makes it cheaper to remix anything already-seen, or anything with a stable pattern, if you’re willing to throw enough resources at it.

AI makes it cheap (eventually almost free) to traverse the already-discovered and reach the edge of uncharted territory. If we think of a sphere, where we start at the center, and the surface is the edge of uncharted territory, then AI lets you move instantly to the surface.

If anything solved becomes cheap to re-instantiate, does R&D reach a point where it can’t ever pay off? Why would one pay for the long-researched thing when they can get it for free tomorrow? There will be some value in having it today, just like having knowledge about a stock today is more valuable than the same knowledge learned tomorrow. But does value itself go away for anything digital, and only remain for anything non-copyable?

The volume of a sphere grows faster than the surface area. But if traversing the interior is instant and frictionless, what does that imply?

by vedmakkon 1/20/2026, 6:28 AM

After reading that post it feels so basic to sit here, watching my single humble claude code agent go along with its work... confident, but brittle and so easily distracted.

by ramon156on 1/20/2026, 9:38 AM

I would also love to see the statistics regarding token cost, electricity cost, environmental damage etc.

Not saying that this only happens with LLMs, in fact it should be compared against e.g. a dev team of 4-5

by Chipshuffleon 1/20/2026, 10:19 AM

The more I think about LLMs the stranger it feels trying to grasp what they are. To me, when I'm working with them, they don't feel intelligence but rather an attempt at mimicking it. You can never trust, that the AI actually did something smart or dump. The judge always has to be you.

It's ability to pattern match it's way through a code base is impressive until it's not and you always have to pull it back to reality when it goes astray.

It's ability to plan ahead is so limited and it's way of "remembering" is so basic. Every day it's a bit like 50 first dates.

Nonetheless seeing what can be achieved with this pseudo intelligence tool makes me feel a little in awe. It's the contrast between not being intelligence and achieving clearly useful outcomes if stirred correctly and the feeling that we just started to understand how to interact with this alien.

by gforce_deon 1/21/2026, 12:17 PM

Wow, for screenshots much faster than chromium:

  $ time target/release/fetch_and_render "https://www.lauf-goethe-lauf.de/"
  real 0m0,685s
  user 0m0,548s
  sys 0m0,070s
  
  $ time chromium --headless --disable-gpu --screenshot=out.png --window-size=1200,800 https://www.lauf-goethe-lauf.de/
  real 0m1,099s
  user 0m0,927s
  sys 0m0,692s
# edit: with a hot-standby chrome and a running node instance a can reach 0,369s seconds here

by tinyhouseon 1/20/2026, 3:12 AM

Well, software is measured over time. The devil is always in the details.

by polyglotfactoon 1/20/2026, 5:21 PM

I'm a maintainer of Servo which is another web engine project.

Although I dissented on the decision, we banned the use of AI. Outside of the project I've been enjoying agentic coding and I do think it can be used already today to build production-grade software of browser-like complexity.

But this project shows that autonomous agents without human oversight is not the way forward.

Why? Because the generated code makes little sense from a conceptual perspective and does not provide a foundation on which to eventually build an entire web engine.

For example, I've just looked into the IndexedDB implementation, which happens to be what I am working on at the moment in Servo.

Now, my work in Servo is incomplete, but conceptually the code that is in place makes sense and there is a clear path towards eventually implementing the thing as a whole.

In Fastrender, you see an Arc<Mutex<Database>> which is never going to work, because by definition a production browser engine will have to involve multiple processes. That doesn't mean you need the IPC in a prototype, but you certainly should not have shared state--some simple messaging between threads or tasks would do.

The above is an easy coding fix for the AI, but it requires input from a human with a pretty good idea of what the architecture should look like.

For comparison, when I look at the code in Ladybird, yet another browser project, I can immediately find my way around what for me is a stranger codebase: not just a single file but across large swaths of the project and understand things like how their rendering loop works. With Fastrender I find it hard to find my way around, despite all the architectural diagrams in the README.

So what do I propose instead of long-running autonomous agents? The focus should shift towards demonstrating how AI can effectively assist humans in building well-architected software. The AI is great at coding, but you eventually run into what I call conceptual bottlenecks, which can be overcome with human oversight. I've written about this elsewhere: https://medium.com/@polyglot_factotum/on-writing-with-ai-87c...

There is one very good idea in the project: adding the web standards directly in the repo so it can be used as context by the AI and humans alike. Any project can apply this by adding specs and other artifacts right next to the code. I've been doing this myself with TLA+, see https://medium.com/@polyglot_factotum/tla-in-support-of-ai-c...

To further ground the AI code output, I suggest telling it to document the code with the corresponding lines from the spec.

Back in early 2025 when we had those discussions in Servo about whether to allow some use of AI, I wrote this guide https://gist.github.com/gterzian/26d07e24d7fc59f5c713ecff35d... which I think is also the kind of context you want to give the AI. Note that this was back in the days of accepting edits with tabs...

by daxfohlon 1/20/2026, 5:38 PM

So we've graduated from unmaintainable slop code to unusable slop products. Sorry, this just doesn't feel like progress toward any meaningful future. But I'm sure it will unburden lots of investors of their money.

by anilgulechaon 1/20/2026, 2:49 AM

That's a wild idea-a browser from scratch! And ladybird has been moving at snails pace for a long time..

I think a good abstractions design and good test suite will make it break success of future coding projects.

by vivzkestrelon 1/20/2026, 3:19 AM

I am waiting for that guy or a team that uses LLMs to write the most optimal version of Windows in existence, something that even surpasses what Microsoft has done over the years and honestly looking at the current state of Windows 11, it really feels like it shouldn't even be that hard to make something more user friendly