Hacker News

by wekon 3/16/2026, 5:07 PMwith 67 comments

by rfw300on 3/16/2026, 5:46 PM

Super interesting study. One curious thing I've noticed is that coding agents tend to increase the code complexity of a project, but simultaneously massively reduce the cost of that code complexity.

If a module becomes unsustainably complex, I can ask Claude questions about it, have it write tests and scripts that empirically demonstrate the code's behavior, and worse comes to worst, rip out that code entirely and replace it with something better in a fraction of the time it used to take.

That's not to say complexity isn't bad anymore—the paper's findings on diminishing returns on velocity seem well-grounded and plausible. But while the newest (post-Nov. 2025) models often make inadvisable design decisions, they rarely do things that are outright wrong or hallucinated anymore. That makes them much more useful for cleaning up old messes.

by matt_heimeron 3/16/2026, 5:59 PM

Yes, it's not surprising that warnings and complexity increased at a higher rate when paired with increased velocity. Increased velocity == increased lines of code.

Does the study normalize velocity between the groups by adjusting the timeframes so that we could tell if complexity and warnings increased at a greater rate per line of code added in the AI group?

I suspect it would, since I've had to simplify AI generated code on several occasions but right now the study just seems to say that the larger a code base grows the more complex it gets which is obvious.

by keedaon 3/16/2026, 9:31 PM

There are actually quite a few studies out there that look at LLM code quality (e.g. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=LLM+...) and they mostly have similar findings. This reinforces the idea that LLMs still require expert guidance. Note, some of these studies date back to 2023, which is eons ago in terms of LLM progress.

The conclusion of this paper aligns with the emerging understanding that AI is simply an amplifier of your existing quality assurance processes: Higher discipline results in higher velocity, lower discipline results in lower stability (e.g. https://dora.dev/research/2025/) Having strong feedback and validation loops is more critical than ever.

In this paper, for instance, they collected static analysis warnings using a local SonarQube server, which implies that it was not integrated into the projects they looked at. As such these warnings were not available to the agent. It's highly likely if these warnings were fed back into the agent it would fix them automatically.

Another interesting thing they mention in the conclusion: the metrics we use for humans may not apply to agents. My go-to example for this is code duplication (even though this study finds minimal increase in duplication) -- it may actually be better for agents to rewrite chunks of code from scratch rather than use a dependency whose code is not available forcing it to instead rely on natural language documentation, which may or may not be sufficient or even accurate. What is tech debt for humans may actually be a boon for agents.

by Slav_fixflexon 3/17/2026, 10:15 AM

Interesting findings. I use AI agents (Claude, Windsurf) exclusively to build production software without being a developer myself. Speed is real but so is context drift – the AI breaks unrelated things while fixing others. Git became essential for me because of this.

by mentalgearon 3/16/2026, 7:01 PM

> We find that the adoption of Cursor leads to a statistically significant, large, but transient increase in project-level development velocity, along with a substantial and persistent increase in static analysis warnings and code complexity. Further panel generalized-method-of-moments estimation reveals that increases in static analysis warnings and code complexity are major factors driving long-term velocity slowdown. Our study identifies quality assurance as a major bottleneck for early Cursor adopters and calls for it to be a first-class citizen in the design of agentic AI coding tools and AI-driven workflows.

So overall seems like the pros and cons of "AI vibe coding" just cancel themselves out.

by qcautomationon 3/16/2026, 8:46 PM

The "transient velocity, persistent complexity" finding tracks with what I've seen. But I think the real issue is most people treat AI code like it's done when it runs. It's not. It's a first draft.

I've been using Claude pretty heavily for the last month or so on a few side projects. The speed is genuinely nuts — stuff that would've taken me a weekend gets roughed out in an afternoon. But if I just commit what it gives me, the codebase turns into this weird verbose mess where everything kinda works but nothing is clean.

What actually helped was treating AI output the same way you'd treat code from a fast but sloppy contractor: run it, write real tests (not the fluffy ones it generates that mock everything into oblivion), then refactor before committing. Basically the build-test-refactor-commit cycle that dalemhurley mentioned. The models are good enough to generate that first pass, but you still need taste to shape it.

The 9% complexity increase makes total sense if people skip that step. Which, let's be honest, most of us do when we're moving fast.

by bisonbearon 3/17/2026, 3:01 AM

Really interesting study. One thing I keep coming back to is that tests have no way of catching this sort of tech debt. The agent can introduce something that will make you rip your hair out in 6 months, but tests are green...

My theory is that at least some of this is solvable with prompting / orchestration - the question is how to measure and improve that metric. i.e. how do we know which of Claude/Codex/Cursor/Whoever is going to produce the best, most maintainable code *in our codebase*? And how do we measure how that changes over time, with model/harness updates?

by woeiruaon 3/16/2026, 10:28 PM

This study's cutoff date was August 2025. I don't think this result is surprising given the level of coding agent ability back then. The whole thing just shows how out-of-date academic publishing is on this subject.

>This yields 806 repositories with adoption dates between January 2024 and March 2025 that are still available on GitHub at the time of data analysis (August 2025).

There were very few people who thought that coding agents worked very well back then. I was not one of them, but I _do_ think they work today.

by AstroBenon 3/16/2026, 6:12 PM

They're measuring development speed through lines of code. To show that's true they'd need to first show that AI and humans use the same number of lines to solve the same problem. That hasn't been my experience at all. AI is incredibly verbose.

Then there's the question of if LoC is a reliable proxy for velocity at all? The common belief amongst developers is that it's not.

by dalemhurleyon 3/16/2026, 7:24 PM

I think the issue is people AI assisted code, test then commit.

Traditional software dev would be build, test, refactor, commit.

Even the Clean Coder recommends starting with messy code then tidying it up.

We just need to apply traditional methods to AI assisted coding.

by mellosoulson 3/16/2026, 6:41 PM

Depends on the nature of the tool I would imagine - eg. Claude Code Terminal (say) would have higher entry requirements in terms of engineering experience (Cursor was sold as newbie-friendly) so I would predict higher quality code than Cursor in a similar survey.

ofc that doesn't take into account the useful high-level and other advantages of IDEs that might mitigate against slop during review, but overall Cursor was a more natural fit for vibe-coders.

This is said without judgement - I was a cheerleader for Cursor early on until it became uncompetitive in value.

by chris_money202on 3/16/2026, 7:05 PM

Now someone do a research study where a summary of this research paper is in the AGENTS.md and let’s see if the overall outcomes are better

by PeterStueron 3/16/2026, 5:56 PM

Interesting from an historical perspective. But data from 4/2025? Might as well have been last century.

by duendefmon 3/16/2026, 7:48 PM

AI is not perfect sure, one has to know how to use it. But this study is already flawed since models improved a lot since the beginning of 2026.

Speed at the cost of quality: Study of use of Cursor AI in open source projects (2025)