Eight more months of agents

by arrowsmithon 2/8/2026, 11:00 AMwith 241 comments

by entropyneuron 2/10/2026, 8:26 AM

> I deeply appreciate hand-tool carpentry and mastery of the art, but people need houses and framing teams should obviously have skillsaws.

Where are all the new houses? I admit I am not a bleeding edge seeker when it comes to software consumption, but surely a 10x increase in the industry output would be noticeable to anyone?

by xyzsparetimexyzon 2/10/2026, 7:24 AM

> Pay through the nose for Opus or GPT-7.9-xhigh-with-cheese. Don't worry, it's only for a few years.

> You have to turn off the sandbox, which means you have to provide your own sandbox. I have tried just about everything and I highly recommend: use a fresh VM.

> I am extremely out of touch with anti-LLM arguments

'Just pay out the arse and run models without a sandbox or in some annoying VM just to see them fail. Wait, some people are against this?'

by happytoexplainon 2/9/2026, 7:04 PM

I don't trust the idea of "not getting", "not understanding", or "being out of touch" with anti-LLM (or pro-LLM) sentiment. There is nothing complicated about this divide. The pros and cons are both as plain as anything has ever been. You can disagree - even strongly - with either side. You can't "not understand".

by joefourieron 2/9/2026, 7:30 PM

The author is correct in that agents are becoming more and more capable and that you don't need the IDE to the same extent, but I don't see that as good. I find that IDE-based agentic programming actually encourages you to read and understand your codebase as opposed to CLI-based workflows. It's so much easier to flip through files, review the changes it made, or highlight a specific function and give it to the agent, as opposed to through the CLI where you usually just give it an entire file by typing the name, and often you just pray that it manages to find the context by itself. My prompts in Cursor are generally a lot more specific and I get more surgical results than with Claude Code in the terminal purely because of the convenience of the UX.

But secondly, there's an entire field of LLM-assisted coding that's being almost entirely neglected and that's code autocomplete models. Fundamentally they're the same technology as agents and should be doing the same thing: indexing your code in the background, filtering the context, etc, but there's much less attention and it does feel like the models are stagnating.

I find that very unfortunate. Compare the two workflows:

With a normal coding agent, you write your prompt, then you have to at least a full minute for the result (generally more, depending on the task), breaking your flow and forcing you to task-switch. Then it gives you a giant mass of code and of course 99% of the time you just approve and test it because it's a slog to read through what it did. If it doesn't work as intended, you get angry at the model, retry your prompt, spending a larger amount of tokens the longer your chat history.

But with LLM-powered auto-complete, when you want, say, a function to do X, you write your comment describing it first, just like you should if you were writing it yourself. You instantly see a small section of code and if it's not what you want, you can alter your comment. Even if it's not 100% correct, multi-line autocomplete is great because you approve it line by line and can stop when it gets to the incorrect parts, and you're not forced to task switch and you don't lose your concentration, that great sense of "flow".

Fundamentally it's not that different from agentic coding - except instead of prompting in a chatbox, you write comments in the files directly. But I much prefer the quick feedback loop, the ability to ignore outputs you don't want, and the fact that I don't feel like I'm losing track of what my code is doing.

by dagsson 2/9/2026, 7:20 PM

    But if you try some penny-saving cheap model like Sonnet [..bad things..]. [Better] pay through the nose for Opus.
After blowing $800 of my bootstrap startup funds for Cursor with Opus for myself in a very productive January I figured I had to try to change things up... so this month I'm jumping between Claude Code and Cursor, sometimes writing the plans and having the conversation in Cursor and dump the implementation plan into Claude.

Opus in Cursor is just so much more responsive and easy to talk to, compared to Opus in Claude.

Cursor has this "Auto" mode which feels like it has very liberal limits (amortized cost I guess) that I'm also trying to use more, but -- I don't really like to flip a coin and if it lands up head then waste half hour discovering the LLM made a mess the LLM and try again forcing the model.

Perhaps in March I'll bite the bullet and take this authors advice.

by 0xbadcafebeeon 2/10/2026, 7:49 AM

Local models are decent now. Qwen3 coder is pretty good and decent speed. I use smaller models (qwen2.5:1.5b) with keyboard shortcuts and speech to text to ask for man page entries, and get 'em back faster than my internet connection and a "robust" frontier model does. And web search/RAG hides a multitude of sins.

"Using anything other than the frontier models is actively harmful" - so how come I'm getting solid results from Copilot and Haiku/Flash? Observe, Orient, Decide, Act, Review, Modify, Repeat. Loops with fancy heuristics, optimized prompts, and decent tools, have good results with most models released in the past year.

by symfrogon 2/10/2026, 8:05 AM

Any sufficiently complicated LLM generated program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of an open source project.

by dirkcon 2/9/2026, 7:12 PM

> Using anything other than the frontier models is actively harmful

If that is true, why should one invest in learning now rather than waiting for 8 months to learn whatever is the frontier model then?

by dmkon 2/9/2026, 7:19 PM

The real insight buried in here is "build what programmers love and everyone will follow." If every user has an agent that can write code against your product, your API docs become your actual product. That's a massive shift.

by post-iton 2/9/2026, 7:52 PM

> Agent harnesses have not improved much since then. There are things Sketch could do well six months ago that the most popular agents cannot do today.

I think this is a neglected area that will see a lot of development in the near future. I think that even if development on AI models stopped today - if no new model was ever trained again - there are still decades of innovation ahead of us in harnessing the models we already have.

Consider ChatGPT: the first release relied entirely on its training data to answer questions. Today, it typically does a few Google searches and summarizes the results. The model has improved, but so has the way we use it.

by dangon 2/9/2026, 7:08 PM

Related. Others?

How I program with agents - https://news.ycombinator.com/item?id=44221655 - June 2025 (295 comments)

by hasperdion 2/9/2026, 6:59 PM

> It sounds like someone saying power tools should be outlawed in carpentry.

I see this a lot here

by webdevveron 2/10/2026, 1:10 PM

>I wish I could share this joy with the people who are fearful about the changes agents are bringing.

The 'fear' is about losing ones livelihood and getting locked out of homeownership and financial security. its not complicated. life is actually largely determined by your access to capital, despite whatever fresh coping strategy the afflicted (and the afflicting) like to peddle.

the quality of life versus capital availability is very non-linear. there is a step-change around the $500k mark where you reach 'orbital velocity', where as long as you dont suffer severe misfortune or make mistakes, you will start accelerating upwards (albeit very slowly.)

under that line, you are constantly having to fight 'gravity'.

basically everyone in tech is openly or quietly aiming to get there, and LLMs have made that trek ever more precarious than before.

by monuson 2/9/2026, 9:07 PM

> Along the way I have developed a programming philosophy I now apply to everything: the best software for an agent is whatever is best for a programmer.

Not a plug but really that’s exactly why we’re building sandboxes for agents with local laptop quality. Starting with remote xcode+sim sandboxes for iOS, high mem sandbox with Android Emulator on GPU accel for Android.

No machine allocation but composable sandboxes that make up a developer persona’s laptop.

If interested, a quick demo here https://www.loom.com/share/c0c618ed756d46d39f0e20c7feec996d

muvaf[at]limrun[dot]com

by jmullon 2/10/2026, 4:07 PM

Regarding the shift away from time spent on agriculture over the last century or so..

> That was a net benefit to the world, that we all don't have to work to eat.

I’m pretty sure most all of us are still working to have food to eat and shelter for ourselves and our families.

Also, while the on-going industrial and technological revolution has certainly brought benefits, it’s an open question as to whether it will turn out to be a net benefit. There’s a large-scale tragedy of the commons experiment playing out and it’s hard to say what the result will be.

by uludagon 2/9/2026, 8:08 PM

> I am having more fun programming than I ever have, because so many more of the programs I wish I could find the time to write actually exist. I wish I could share this joy with the people who are fearful about the changes agents are bringing.

It might be just me but this reads as very tone deaf. From my perspective, CEOs are seething at the mouth to make as many developers redundant as possible, not being shy about this desire. (I don't see this at all as inevitable, but tech leaders have made their position clear)

Like, imagine the smugness of some 18th century "CEO" telling an artisan, despite the fact that he'l be resigned to working in horrific conditions at a factory, to not worry and think of all the mass produced consumer goods he may enjoy one day.

It's not at all a stretch of the imagination that current tech workers may be in a very precarious situation. All the slopware in the world wouldn't console them.

by zerotoleranceon 2/13/2026, 7:10 PM

There is a lot of "my" floating around in this article. I always love getting peeks into experiences with this sort of thing, but I think the "mys" highlight something I've seen every day. These agents are really great at bespoke personal flows that build up a TON of almost personal tribal knowledge about how things get done if there is any consistency to those flows at all. Doing this in larger theaters is much more difficult because tribal knowledge is death for larger teams. It drives up the cost of everything which is why individuals or extremely new small teams feel so much more productive. Everything is new here and consistency doesn't matter yet.

by dsignon 2/9/2026, 7:33 PM

Look, I'm very negative about this AI thing. I think there is a great chance it will lead to something terrible and we will all die, or worse. But on the other hand, we are all going to die anyway. Some of us, the lucky ones, will die of a heart attack and will learn of our imminent demise in the second it happens, or not at all. The rest of us will have it worse. It has always been like that, and it has only gotten more devastating since we started wearing clothes and stopped being eaten alive by a savanna crocodile or freezing to death during the first snowfall of winter.

But if AI keeps getting better at code, it will produce entire in-silico simulation workflows to test new drugs or even to design synthetic life (which, again, could make us all die, or worse). Yet there is a tiny, tiny chance we will use it to fix some of the darkest aspects of human existence. I will take that.

by Havocon 2/10/2026, 12:51 PM

Disagree with the point about anything less than opus being harmful to learning.

Much of my learning still requires experimentation - including lots of token volume so hitting limits is a problem.

And secondly I’m looking for workflows that build the thing without needing to be at the absolute edge of the LLM capability. Thats where fragility and unpredictability live. Where a new model with slightly different personality is released and it breaks everything. I’d rather have flow that is simple and idiot proof that doesn’t fall apart at the first sign of non-bleeding edge tokens. That means skipping the gains from something opus could one shot ofc but that’s acceptable to me

by gipon 2/10/2026, 6:17 AM

> In 2026, I don't use an IDE any more.

I don't think it is the best way to look at it. I think that now every team has the power to build and maintain an internal agent (tool + UX) to manager software products. I don't necessarily think that chat-only is enough except for small projects, so teams will build agent that gives them access to the level of abstraction that works best.

It's a data point but this weekend (e.g. in 2 days) I build a desktop + web agent that is able to help me reason on system design and code. Built with Codex powered by the Codex SDK. It is high quality. I've been a software engineer and director of engineering for 10 years. I'm blown away.

by nickcwon 2/10/2026, 8:48 AM

> Some believe AI Super-intelligence is just around the corner (for good or evil). Others believe we're mistaking philosophical zombies for true intelligence, and speedrunning our own brainrot

Not sure which camp I'm in, but I enjoyed the imagery.

by dent9on 2/11/2026, 5:58 PM

> I am extremely out of touch with anti-LLM arguments

Wow I know that feel.

I'm here using LLM for daily work and even hobbies in very conservative manners and didn't think much of it.

Now when I have casual discussions with other folks, especially non-tech people, the visceral hatred I get for even mentioning AI and the fact that I use it is insane. There's like an entire sub group of people who are so out of touch with these tools they think they're the devil like the anti-GMO crazies and the PETA psychos.

by mtlynchon 2/10/2026, 1:59 PM

> Along the way I have developed a programming philosophy I now apply to everything: the best software for an agent is whatever is best for a programmer.

I agree with this and I think it's funny to see people publish best practices for working with AI that are like, "Write a clear spec. Have a style guide. Use automated tests."

I'm not convinced it's 100% true because I think there are code patterns that AI handles better than humans and vice versa. But I think it's true enough to use as a guiding philosophy.

by dmos62on 2/10/2026, 7:09 AM

> the best software for an agent is whatever is best for a programmer

My conclusion as well. It feels paradoxical, maybe because on some level I still think of an LLM as some weird gadget, not a coworker. Context ephemerality is more or less the only veritable difference from a human programmer, I'd say. And, even then, context introduction with LLMs is a speedrun of how you'd do it with new human members of a project. Awesome times we live in.

by conartist6on 2/10/2026, 2:59 PM

IDEs are going to come roaring back.

As the author says, there's nothing wrong with the idea of the IDE. Of course you want to be using the best, most powerful tools!

AI showed us that our current-gen text-editor-first IDEs are massively underserving the needs of the public, yes, but it didn't really solve that problem. We still need better IDEs! What has changed is that we now understand how badly we need them. (source: I am an IDE author)

by jFriedensreichon 2/10/2026, 10:56 AM

Its funny how many variations of meaning people assign to agent related terms. Conflating agent with cli and as opposite spectrum of ide is a new one i did not encounter before. I run agents with vscode-server also in a vm and would not give up the ability to have a proper gui anytime i feel like and also being able to switch seamless between more autonomous operation and more interactive seems useful at any level.

by piokochon 2/10/2026, 12:17 PM

"In 2026, I don't use an IDE any more."

Just a question? What IDE feature is obsolete now? Ability to navigate the code? Integration with database, Docker, JIRA, Github (like having PR comments available, listed, etc), Git? Working with remote files? Building the project?

Yes, I can ask copilot to build my project and verify tests results, but it will eat a lot of tokens and added value is almost none.

by imronon 2/10/2026, 7:31 AM

> By far the greatest IDE I have ever used was Visual Studio C++ 6.0 on Windows 2000

Visual C++ 6 was incredible! My favourite IDE of all time too.

by emmawirton 2/9/2026, 7:22 PM

Curious what you mean by "agent harness" here... are you distinguishing between true autonomous agents (model decides next step) vs workflows that use LLMs at specific nodes? I've found the latter dramatically more reliable for anything beyond prototyping, which makes me wonder if the "model improvement" is partly better prompting and scaffolding.

by vedhanton 2/10/2026, 2:15 PM

The sandboxing pain is real. Sadly, a new VM seems like the most simple and viable solution. I don't think the masses are doing any sandboxing at all. We really need a sandbox solution that is sort of dynamic and doesn't pester the user with allow/deny requests. It has to be intelligent and keep up with the llm agents.

by sandgrownunon 2/10/2026, 3:11 PM

I agree with his assessment up until this point in time, it is where we currently are. But it seems to me there is still a large chunk of engineers who don't extrapolate capability out to the engineer being taken out of the loop completely. Imo, it happens in fairly short order. 2-3 years.

by 63on 2/9/2026, 7:37 PM

I have no problem with experienced senior devs using agents to write good code faster. What I have a problem with is inexperienced "vibecoders" who don't care to learn and instead use agents to write awful buggy code that will make the product harder to build on even for the agents. It used to be that lack of a basic understanding of the system was a barrier for people, but now it's not, so we're flooded with code written by imperfect models conducted by people who don't know good from bad.

by gurjeeton 2/10/2026, 5:23 PM

> By far the greatest IDE I have ever used was Visual Studio C++ 6.0 on Windows 2000. I have never felt like a toolchain was so complete and consistent with its environment as there.

+1. I've tried many times, and failed, to replicate the joy of using that toolchain.

by only2peopleon 2/10/2026, 8:31 AM

>this is why I'm building

My clipart folder of that kid with the lolipop continues to stay relevant

by Herringon 2/9/2026, 7:27 PM

> In 2000, less than one percent lived on farms and 1% of workers are in agriculture. That was a net benefit to the world, that we all don't have to work to eat.

The jury's still out on that one, because climate change is an existential risk.

by thegrim000on 2/10/2026, 1:00 PM

"Eight more months of Bitcoin. It's usage continues to dramatically expand. The amount of transactions is increasing exponentially. Soon, fiat currencies will collapse, all replaced by Bitcoin transactions. If you haven't converted your assets over to Bitcoin you're going to be left behind and lose it all. I can't even understand people that don't see the obvious technical superiority of Bitcoin, such people are going to go through rough times."

by jeffrallenon 2/10/2026, 6:08 AM

Listen to this guy. I've been using his code for a long time, and it works. I am a happy customer of his service, and it works. I listen to his advice and it works.

by redkoalaon 2/10/2026, 10:19 PM

What are those 3 sentences that the author typed to replicate Stripe for his situation?

by ares623on 2/10/2026, 8:38 AM

guys it's an ad

by pannyon 2/10/2026, 5:50 AM

I see a lot of people here saying things like:

>ah, they're so dumb, they don't get it, the anti-LLM people

This is one of the reasons I see AI failing in the short term. If I call you an idiot, are you more or less likely to be open minded and try what I'm selling? AI isn't making money, 95% of companies are failing with AI

https://fortune.com/2025/08/18/mit-report-95-percent-generat...

I mean, your AIs might be a lot more powerful if it was generating money, but that's not happening. I guess being condescending to the 95% of potential buyers isn't really working out.

by MrSandingManon 2/10/2026, 9:23 AM

> In 2000, less than one percent lived on farms and 1% of workers are in agriculture. That was a net benefit to the world, that we all don't have to work to eat.

Not obvious

> To me that statement is as obvious as "water is wet".

Well... is water *wet* or does it *wet things*? So not obvious either.

I'm really dubious when reading posts posing some things as obvious or trivial. In general they are not.

by Krei-seon 2/9/2026, 7:34 PM

The author has a github.

by hoistbypetardon 2/10/2026, 7:10 AM

> To me that statement is as obvious as "water is wet".

Water is not wet. Water makes things wet. Perhaps the inaccuracy of that statement should be taken as a hint that the other statements that you hold on the same level are worthy of reconsideration.

by almostdeadguyon 2/9/2026, 6:59 PM

In the past couple days I've become less skeptical of the capabilities of LLMs and now more alarmed by them, contra the author. I think if we as a society continue to accept the development of LLMs and the control of them by the major AI companies there will be massively negative repercussions. And I don't mean repercussions like "a rogue AI will destroy humanity" per se, but these things will potentially cause massive social upheaval, a large amount of negative impacts on mental health and cognition, etc. I think if you see LLMs as powerful but not dangerous you are not being honest.