With the increase of vibe coding I am interested in knowing some creative ways people are automating their coding work.
One of my biggest unlocks has been embracing Claude Code for web - the cloud version - and making sure my projects are setup to work with it.
I mainly work in Python, and I've been ensuring that all of my projects have a test suite which runs cleanly with "uv run pytest" - using a dev dependency group to ensure the right dependencies are installed.
This means I can run Claude Code against any of my repos and tell it "run 'uv run pytest', then implement ..." - which is a shortcut for having it use TDD and write tests for the code it's building, which is essential for having coding agents produce working code that they've tested before they commit.
Once this is working well I can drop ideas directly into the Claude app on my iPhone and get 80% of the implementation of the idea done by the time I get back to a laptop to finish it off.
I wrote a bit about "uv run pytest" and dependency groups here: https://til.simonwillison.net/uv/dependency-groups
If I know what I want to code and it's a purely mechanical exercise to code it, I'll just tell Claude what to do and it does it. Pretty neat.
When I don't know what I want to do, I read existing code, think about it, and figure it out. Sometimes I'll sketch out ideas by writing code, then when I have something I like I'll get Claude to take my sketch as an example and having it go forward.
The big mistake I see people make is not knowing when to quit. Even with Opus 4.5 it still does weird things, and I've seen people end up arguing with Claude or trying to prompt engineer their way out of things when it would have been maybe 30 seconds of work to fix things manually. It's like people at shopping malls who spend 15 minutes driving in the parking lot to find a spot close to the door when they could have parked in the first spot they saw and walked to the door in less than a minute.
And as always, every line of code was written by me even if it wasn't written by me. I'm responsible for it, so I review all of it. If I wouldn't have written it on my own without AI assistance I don't commit it.
I actually kind of do the opposite to most developers.
Instead of having it write the code, I try to use it like a pair reviewer, critiquing as I go.
I ask it questions like "is it safe to pass null here", "can this function panic?", etc.
Or I'll ask it for opinions when I second guess my design choices. Sometimes I just want an authoritative answer to tell me my instincts are right.
So it becomes more like an extra smart IDE.
Actually writing code shouldn't be that mechanical. If it is, that may signify a lack of good abstractions. And some mechanical code is actually quite satisfying to write anyway.
ML Engineer here. For coding, I mostly use Cursor/Claude Code as a fast pair. I'll detail what I want at a high level, let it draft, then I incrementally make changes.
Where I've automated more aggressively is everywhere around the code. My main challenge was running experiments repeatedly across different systems and keeping track of the various models I ran and their metrics, etc. I started using Skyportal.ai as an ops-side agent. For me, it's mostly: take the training code I just iterated on, automatically install and configure the system with the right ML stack, run experiments via prompt, and see my model metrics from there.
It’s usually just a slightly faster web search. When I try to have it do more, I end up spinning my wheels and then doing a web search.
I’ll sometimes have it help read really long error messages as well.
I got it to help me fix a reported security vulnerability, but it was a long road and I had to constantly work to keep it from going off the rails and adding insane amounts of complexity and extra code. It likely would have been faster for me to read up on the specific vulnerability, take a walk, and come back to my desk to write something up.
I find it most useful for getting up to speed on new libraries quickly, and for bouncing design ideas. I'll lay out what my goals are and the approaches I'm considering, and ask it to poke holes in them or to point out issues or things to keep in mind. Found it shockingly helpful in covering my blind spots
For my real work? It has not been helpful so far.
For side projects? It's been a 10x+ multiplier.
On the other side of the equation I've been spending much more time on code-review on an open source project I maintain, because developers are much more productive and I still code-review at the same speed.
The real issue is that I can't trust the AI generated code, or trust the AI to code-review for me. Some repeated issues I see:
- In my experience the AI doesn't integrate well with the code that there is already there: it often rewrites functionality and tend not to adhere to the project's conventions, but rather use what it is trained on.
- The AI often lacks depth into more complex issues. And because it doesn't see the broader implication of changes, it often doesn't write the tests that would cover them. Developers that wrote the PRs accept the AI tests without much investigation into the code-base. Since the changes passes the (also insufficient) tests, they send the PR to code-review.
- With AI I think (?) I'm more often the one careful deep diving into the project and re-designing the generated code in the code-review. In a way it's an indirect re-prompting.
I'm very happy with the increased PRs: they push the project forward, with great ideas of what to implement, and I'm very happy about AI increased productivity. Also, with AI developers are bolder in their contributions.
But this doesn't scale -- or I'll spend all my time code-reviewing :) I hope the AIs get better quickly.
We're overlooking a critical metric in AI-assisted development: Token and Context Window to Utility Ratio.
AI coding tools are burning massive token budgets on boilerplate thousands of tokens just to render simple interfaces.
Consider the token cost of "Hello World":
- Tkinter: `import tkinter as tk; tk.Button(text="Hello").pack()`
- React: 500MB of node_modules, and dependencies
Right now context windows token limits are finite and costly. What do you think?
My prediction is that tooling that manage token and context efficiency will become essential.
At work unfortunately (?) we don't use any AI but there is movement to introduce it in some form (it is a heavily regulated area so it won't be YOLO coding using an agent for sure).
But my side projects which I kinda abandoned a long time ago are getting a second life and it is really fun just to direct the agent instead of slowly re-aquire all of the knowledge and waste time typing in all the stuff into the computer.
I'm not. I'm learning a little bit each day, making my brain better and myself more productive as I go.
We use beads for everything. We label them as "human-spec" needed if they are not ready to implement. We label them as "qa-needed" if they cannot be verified through automatic tests.
I wrote beads-skills for Claude that I'll release soon to enforce this process.
2026 will be the year of agent orchestration for those of us who are frustrated having 10 different agents to check on constantly.
gastown is cool but too opinionated.
I'm excited about this promising new project: https://github.com/jzila/canopy
We're writing an internal tool to help with planning, which most people don't think is a problem but I think is a serious problem. Most plans are either too long and/or you end up repeating yourself.
Instead of building features from scratch, I basically rebuilt my open-source alternatives project[0] from five years ago to track features right in the code itself. So if you spot an open-source project with a feature you want in your app, you can create a "skill" that points exactly to where it lives: specific files, functions, modules, plus docs and notes. This turns OSS into a modular cookbook you can pull from across stacks.
AI excels at finding the "seams," those spots where a feature connects to the underlying tech stack, and figuring out how the feature is really implemented. You might think just asking Claude or Cursor to grab a feature from a repo works, but in practice they often miss pieces because key code can be scattered in unexpected places. Our skills fix that by giving structured, complete guides so the AI ports it accurately. For example, if an e-commerce platform has payments built in and you need payments in your software, you can reference the exact implementation and adapt it reliably.
Well, I built https://github.com/rcarmo/agentbox - which I run in a VM, with a dedicated container per project. Right now most of them are running Copilot CLI, others Mistral Vibe or Toad, and I have just built a small web front end based on https://github.com/rcarmo/textual-webterm that lets me see a dashboard of all running tmux sessions inside the containers, click through, and prompt the agents every hour or so.
Looks like this: https://mastodon.social/@rcarmo/115937685095982965
SyncThing syncs their workspaces to my desktop/laptop, I make small adjustments (I don’t do stupid wasteful things like the Ralph approach, I prefer clear SPEC documents and TODO checklists, plus extensive testing and switching models for doing code audits on each other), my changes sync back, etc.
I’m considering calling it “million monkeys”, really.
The biggest unlock for me happened because of claude code. It has allowed me and my team to ship features much faster. I use it to work on the main product, but also to build a lot of in-house tools. We have observability and testing tools for our AI agent that helps us improve the main product. With Claude we can iterate much faster on them and add the features that we specifically need. And not use off-the-shelf products. These are not user facing products/code bases so maybe the quality bar is not that high. But without claude we would have taken resources away from working on the main product to build them.
Testing is no yet the main focus, so we haven't looked into automating that. But we will in the future. We have automated most of our documentation updates though. After releases or big merges we use askmanu to automatically update/create the docs. These are internal docs, but super useful for us and Claude.
Disclaimer: I'm the founder of askmanu
I haven't automated anything to be honest, but LLMs are invaluable in connecting dots in repositories or exploring dependencies source codes.
The first saves me days of work/month by sparing me endless paper pages of notes trying to figure out why things work in a certain way in legacy work codebases. The second spares me from having to dig too much in partially outdated or lacking documentation or having to melt my brain understanding the architecture of every different dependency.
So I just put major deps in my projects in a `_vendor` directory that contains the source code of the dependencies and if I have doubts LLMs dig into it and their test to shed light.
What I haven't seen anybody yet accomplish is produce quality software by having AI write them. I'm not saying they can't help here, but the bottleneck is still reviewing and as soon as you get sloppy, codebase quality goes south, and the product quality follows soon after.
I recently took a job maintaining and extending the functionality of an enterprise Enterprise Asset Management product through its own scripting and xml-soup ecosystem. Since it is such a closed system with a much smaller dataset of documentation and examples, it has been great at using what it does know to help me navigate and understand the product as a whole, and how to think of things in regard to how this product works behind the scenes in the code I cannot see.
It doesn't write the code for me, but I talk to it like it is a personal technical consultant on this product and it has been very helpful.
With Happy (https://blog.denv.it/posts/im-happy-engineer-now/)
Each project gets its own share of supervision depending on how critical human intervention is needed.
I have some complex large and strict compliance projects that the AI is a pair programmer but I make most of the decisions, and I have smaller projects that, despite great impact on the bottom line, can be entirely done unsupervised due to the low risk factor of "mistakes" and the easiness of correcting them after the fact they are caught by the AI as well.
I prepare custom AGENTS.md with the help of https://lynxprompt.com (Disclaimer: I'm the dev)
The more time you spend making guidelines and guardrails, the more success the LLM has at acing your prompt. There I created a wizard to get it right from the beginning, simplifying and "guiding" you into thinking what you want to achieve.
The most useful automation for me has been a few simple commands. Here are some examples I use for repos in GitHub to resolve issues and PRs.
/gh-issue [issue number]
/gh-pr [pr number]
Edit: replaced links to private github repo to pastebin.
I have used custom code generators at work for 25+ years.
The generators typically generates about 90% of the code I need to write a biz app. Leaving the most important code to me: the biz logic.
No AI. Just code that takes a (simple) declarative spec file and generates Typescript/C++/Java/... code.
I am also using AI's daily. However the code generators are still generating more productivity for me than AI's ever have.
I use CLAUDE.md to describe the project. I use Claude to help write the spec. Then I let it run. When the context gets too crammed, I have it build SKILLS.md. I’ll probably have it rewrite CLAUDE.md after a while. Then it will write tests, deployment scripts, commit messages. Yeah, everything.
By using the *API*
True I spent a year making a platform for using the API.. but the results... are stupendous!! Very cheap and unlimited access and custom tooling, etc... to get large amounts done of anything you want to do with an LLM!
I think of the biggest chunk of task where I expect the model currently available to do well. I try to describe it precisely and give it all relevant content by uploading the relevant code files. Then I hit enter.
I'm using a modified version of open Dev. it's just a chat interface for open router. I load a combo box of all the free models. it's fast and it's on a tab where it can't hurt my code
Just imagine when we have quantum computing. Unlimited context windows, instant answers.
What would you make if you could make anything? Does it all just lose meaning?
I find it very useful to make quick CLI scripts to pipe data in and out of.
opencode, then save your important dev info to AGENTS.md
The biggest principle is codification. Codify everything.
For instance, this skill of web development: https://raw.githubusercontent.com/vercel-labs/web-interface-...
That’s too much for a model to carry in its context while it’s trying to do actual work.
Far better is to give that skill.md to a model and have it produce several hundred lines of code with a shebang at the top. Now you haven’t got a skill, you’ve got a script. And it’s a script the model can run any time to check its work, without knowing what the script does, how, or why - it just sees the errors. Now all your principles of web dev can be checked across your codebase in a few hundred milliseconds while burning zero tokens.
TDD is codification too: codifying in executable form the precise way you want your logic to work. Enforce a 10ms timeout on every unit test and as a side effect your model won’t be able to introduce I/O or anything else that prevents parallel, randomized execution of your test suite. It’s awesome to be able to run ALL the tests hundreds of times per day.
Constantly checking your UI matches your design system? Have the model write a script that looks at your frontend codebase and refuses to let the model commit anything that doesn’t match the design system.
Codification is an insanely powerful thing to build into your mindset.