Putting yourself in a situation where this could happen is kinda insane, right? Could be something I'm missing.
I can't think of any specific example where I would let any agent touch a production environment, the least of which, data. AI aside, doing any major changes makes sense to do in a dev/staging/preview environment first.
Not really sure what the lesson would be here. Don't punch yourself in the face repeatedly?
If you found this post helpful, follow me for more content like this.
I publish a weekly newsletter where I share practical insights on data and AI.
It focuses on projects I'm working on + interesting tools and resources I've recently tried: https://alexeyondata.substack.com
It's hard to take the author seriously when this immediately follows the post. I can only conclude that this post was for the views not anything to learn from or be concerned about.
An engineer recklessly ran untrusted code directly in a production environment. And then told on himself on Twitter.
Iām not going to ādefendā the LLM here but this:
> I forgot to use the state file, as it was on my old computer
indicates that this person did not really know what they were doing in the first place. I honestly think using an LLM to do the terraform setup in the first place would probably have led to better outcomes.Quite funny that that author followed up with this tweet:
> If you found this post helpful, follow me for more content like this.
> I publish a weekly newsletter where I share practical insights on data and AI.
Despite multiple comments blaming the AI agent, I think it's the backups that are the problem here, right? With backups, almost any destructive action can be rolled back, whether it's from a dumb robot, a mistaken junior, or a sleep-deprived senior. Without, you're sort of running the clock waiting for disaster.
Well apparently guy were running tf from his computer and ask claude to apply changes while not providing state file, and āblamingā claude for the catastrophic result?
I'm cool with blogging about your mess-ups, sort of. Is "I'm incompetent" a good content strategy though? Yeah, you're going to get a lot of traffic to that post, but what are you signaling? Your product is a thousand bucks a year. I would not go near it.
I'm absolutely loving the genre of "chatbot informing user it messed up real bad":
> CRITICAL: Everything was destroyed. Your production database is GONE. Let me check if there are any backups:
> ...
> No snapshots found. The database is completely lost.
And this guy is selling tech courses.
I'm no AI advocate, I have been using it for 6 months now, it's a very powerful tool, and powerful tools need to be respected. Clearly this guy has no respect for their infrastructure.
The screenshot he has, "Let me check if there are backups", a typical example of how lazy people use AI.
the point about keeping `terraform apply` human-only is the right instinct, but I think the conversation is missing a structural layer.
The real issue isn't "don't let agents touch prod." It's that terraform destroy and terraform apply share the same permission scope. There's no blast radius boundary. If an agent (or a tired human) can run `apply`, they can destroy anything in that state file. renewiltord touched on this too, always forward-evolving infra instead of running destroy directly.
I got curious about this pattern and started thinking about it differently: what if agent-accessible Terraform operations were scoped to a read-only plan phase by default? The agent generates the plan, a human reviews the diff, and only then does apply run. You'd essentially treat the agent like a junior engineer who can write PRs but can't merge to main.
The deeper problem is that most IaC tools weren't designed with the assumption that the operator might be an LLM with no concept of irreversibility.
Has anyone implemented a plan-only mode for agents that's actually held up in practice?
Thatās why you tell CC to do a āterraform planā to verify itās not wrecking critical infrastructure and NEVER vibe-code infrastructure.
Why would you do this?
> Make no backups
> Hand off all power to AI
> Post about it on twitter
> "Teaching engineers to build production AI systems"
This has to be ragebait to promote his course, no?
Terraform + Claude Code almost got me too. Had it run `terraform destroy` on a staging environment two weeks ago because it "cleaned up unused resources." Luckily staging, not prod. After that I started routing all agent shell commands through Daedalab, it intercepts anything destructive before execution. Would've saved you the heartache here.
I love the guys twitter bio thing
>Teaching engineers to build production AI systems
>100,000+ learners
No.
YOU wiped you production database.
YOU failed to have adequate backups.
YOU put Claude Code forward as responsible but itās just a tool.
YOU are responsible, not āthe AI did it!ā
I donāt use Terraform much anymore because donāt need it but thatās not how you use it.
Always forward evolve infra. Terraform apply to add infra, then remove the definition and terraform apply to destroy it. Thereās no use in running terraform destroy directly on a routine basis.
Also, I assume you defined RDS snapshots also in the same state? This is clearly erroneous. It means a malformed apply human or agent results in snapshot deletion.
The use of terraform destroy is a footgun waiting for a tired human to destroy things. The lesson has nothing to do with agent.
This is exactly the problem I built CRE (Claude Rule Enforcer) to solve. It's a two-layer enforcement gate that sits between your AI agent and the system:
Layer 1: Regex matching. Blocks rm -rf, force push, terraform destroy, DROP TABLE in under 10ms. No LLM call. Just a wall.
Layer 2: LLM advisory. For grey areas, a lightweight model reads the conversation context and checks whether the user actually asked for that action.
It hooks into Claude Code's PreToolUse event, so destructive commands get blocked before execution. Also works with Cursor, Windsurf, Cline, Aider.
The key insight: the company selling you the agent shouldn't also be the one providing safety. CRE is independent.
Open source, 790+ tests. UK Patent GB2604445.3.
You wiped your production database. You actively ignored the warnings of your tooling and your backup strategy was bad. Incompetence as content is surging in the last few weeks.
Can someone explain to me why anyone would do this, and then tweet about it..? Is he really trying to blame 'ai agents' and 'terraform' .. ??
Not the first time i've seen vibe coders causing havoc on production systems.
Under no circumstances should you even let an AI agent near production system at all.
Absolutely irresponsible.
Yes, the engineer is at fault, but the instinct to attack him is distracting from the more interesting conversation, which is that AI and agents are making it more complicated to properly set up security. I imagine it will get better over time, but right now, it's much easier to shoot yourself in the foot than ever before.
I've been steering AI tools instead of writing code directly for over a year. This week's Cursor + Claude Code incidents felt like the same pattern to me: tools treating explicit operator intent as negotiable.
Wrote up what guard rails actually hold in production: https://open.substack.com/pub/triduanacelebrer831482/p/the-t...
A good rule of thumb:
- Don't even let dev machines access the infra directly (unless you're super early in a greenfield project): No local deploys, no SSH. Everything should go through either the pipeline or tools.
Why?
- The moment you "need" to do some of these, you've discovered a usecase that will most likely repeat.
- By letting every dev rediscover this usecase, you'll have hidden knowledge, and a multitude of solutions.
In conversation fragments:
- "... let me just quickly check if there's still enough disk space on the instance"
- "Hey Kat, could you get me the numbers again? I need them for a report." "sure, I'll run my script and send them to you in slack" "ah.. Could you also get them for last quarter? They're not in slack anymore"
One of Terraform's most powerful features that it will tell exactly which resources change before it makes the changes. The hard part is writing Terraform, not reviewing and running one command. In my workflows I am the one who runs "terraform apply", NOT the agent.
This guy is... an interesting person.
He had a state file somewhere that was aligned to his current infrastructure... why isn't this on a backend, who really knows...
He then ran it without a state file and the ran a terraform apply... whatever could get created would get created, whatever conflicted with a resource that already would fail the pipeline... moreso... he could've just terraform destroyed after he let it finish and it would've been a way more clean way to clean up after himself.
Except... he canceled the terraform apply... saw that it created resources and then tried to guess which resources these were...
I'm sorry he could've done all of this by himself without any agentic AI. Its PICNIC 100%
For all the employment insecurity going around, each day I am more and more confident in myself. I imagine myself ten years from now as one of the 1-in-10 guys left who actually knows things anymore, even just reads things. It will be a formidable superpower!
Still, if in ten years I am on the streets, I will still have spared myself whatever this hell is... I know they deserve it, but I still feel bad for the humans in the center here. How can we blame people really when the whole world and their bosses are telling you its ok? Surely its a lot of young devs too here.. Such a terrible intro to the industry. Not sure I'd ever recover personally.
Wow. For this to happen, there's like 5 levels of sloppiness that need to be (or not be) there.
Good thing the guy is it's own boss, I would've fired his ass immediately and sue for damages as well. This is 100% neglectful behavior.
Your AWS backup snapshots must go one-way (append-only) to a separate AWS account, to which access is extremely limited and never has any automated tools connecting to with anything other than read access. I donāt think it costs more to do thatā-but it takes your backups out of the blast radius of a root or admin account compromise OR a tool malfunction. With AWS DLM, you can safely configure your backup retention in the separate AWS account and not risk any tools deleting them.
Terraform is a ticking time bomb. All it takes is for a new field to show up in AWS or a new state in an existing field, and now your resource is not modified, but is destroyed and re-created.
I will never trust any process, AI or a CD pipeline, execute `terraform apply` automatically on anything production. Maybe if you examine the plan for a very narrow set of changes and then execute apply from that plan only, maybe then you can automate it. I think itās much rarer for Terraform to deviate from a plan.
Regardless, you must always turn on Delete Protection on all your important resources. It is wild to me that AWS didnāt ship EKS with delete protection out of the gateā-they only added this feature in August 2025! Not long before that, Iāve witnessed a production database get deleted because Terraform decided that an AWS EKS cluster could not be modified, so it decided to delete it and re-create it, while the team was trying to upgrade the version of EKS. The same exact pipeline worked fine in the staging environment. Turns out production had a slight difference due to AWS API changes, and Terraform decided it could not modify.
The use of a state file with Terraform is a constant source of trouble and footguns:
- you must never use a local Terraform state file for production thatās not committed to source control - you must use a remote S3 state file with Terraform for any production system thatās worth anything - ideally, the only state file in source control is for a separate Terraform stack to bootstrap the S3 bucket for all other Terraform stacks
If youāre running only on AWS, and are using agents to write your IaaC anyway, use AWS CloudFormation, because it doesnāt use state files, and you donāt need your IaaC code to be readable or comprehensible.
I do not let any `terraform apply` commands occur via automation in my org.
This is what happens when you give an agent execution power without guardrails. The tool isn't the problem ā the absence of governance is. In my setup I treat the AI as a junior dev with root access: every destructive operation requires explicit human approval, and the session context includes hard constraints on what it can and can't touch.
The productivity gains from AI agents are real, but only if you invest in the boring part first ā deterministic boundaries that don't depend on the model being smart enough to not break things.
Claude code is really dangerous. It doesnāt even tell you what it is doing. You have no idea what it is thinking or what it is changing.
Theyāre doing it to try and stop people copying their methods, but itās evil.
That's why you always should have the .tfstate file stored in any cloud provided as called terraform backend and can try the dry run command in these kind of scenarios to see what will be created before running the actual terraform apply or destroy commands.
I suspect we need to build MCP servers that prevent destructive commands. For example, we need a "bash" tool doesn't invoke /usr/bin executables directly. The agent should think it is invoking a unix command but those commands are proxies that prevent destructive operations with no ability for an agent to circumvent the restrictions. If there isn't a MCP server for your specific setup/need, building one just for your need should be your first step.
Thankfully, I don't know anyone this insane doing sysadmin job, yet. But if I knew I'd make sure he's fired and never touches prod again.
[dupe] Source: https://alexeyondata.substack.com/p/how-i-dropped-our-produc... (https://news.ycombinator.com/item?id=47275157)
To think I used to find Silicon Valley a bit too much on the nose: https://www.youtube.com/watch?v=m0b_D2JgZgY
The ability to delete backups is wild. That simply should have have mfa on it by default. Send at least for that event the owner of the account an email and sms; He, All Your Snapshots Are Going To Be Deleted, Enter this OTP if you are! Can't hurt: still automate all, just cannot delete snapshots like that no matter what.
There was a project at Ansible that aimed to address this kinda thing when I worked there. The idea was to write policy as code definitions that would prevent users (or AI) from running certain types of automation. I donāt know where that project ended up but reading about this makes me think that they were on to something.
Pairs well with his Twitter bio: "Teaching engineers to build production AI systems".
This is like /r/wallstreetbets loss porn. Why is he posting his own idiocy for clout? I can only guess it's fake and he's trying to gin up rage engagement. It's certainly working on here.
> Founder @DataTalksClub | Teaching engineers to build production AI systems | AI agents, LLMs, ML, data engineering | 100,000+ learners
Dear lord, imagine this guy teaching you how to build anything in production...
This is only the beginning. People are far far too reckless with their LLMs.
I am still heavily checking everything theyāre doing. I canāt get behind others letting them run freely in loops, maybe Iām ābehindā.
Blaming it on AI agents is the new blaming it on the intern.
It has never been the intern's fault, it's always the lack of proper authorization mechanisms, privilege management and safeguards.
At the most basic level, even just not letting Claude run terraform apply would've solved this issue. Review the god damn plan first! This is like engineering 101
Vibeadministration is coming after vibecoding. Get ready...
Yeah, sure, blame Claude for not having backups. Sure do.
AI is an unknown quantity because it hallucinates.
There is no way to prevent hallucinations.
AI is not production ready.
Anyone that uses AI in a production environment is an idiot.
So even if you delete everything and make sure to keep no backups, amazon can still recover the db. What am i missing here?
I canāt wait for ChatGPT to control the autonomous weapons, screw it put it in charge of the nukes!
Fixed it: "We used Claude code to write Terraform that wiped out production database".
I blame not only the engineer who ran the command, Claude which made the mistake, but also software engineers as a group (because Terraform is way too dangerous a tool to be used by engineers and not dedicated SREs, yet we have somehow made this the default. I'm happy to be convinced otherwise, but I've seen enough carnage when "senior" engineers fuck up terraform that it'll be difficult), and also I blame cloud platforms like AWS that are overly complex and led to the Claude confusion.
Stores his production TF state on his local computerā¦
I donāt think AI is to blame here.
I'm an amateur and would never let AI touch a live database....
This is exactly why you can't replace DevOps engineers with AI
This is why Claude only has read access to all my infrastructure.
Rookie move. Why is Claude Code able to run terraform?
s/Claude Code/unsupervised intern/ and it's the same story, except people might have more sympathy (for the intern).
Once again thereās another horror story from someone who doesnāt use punctuation. Iād love to see the rest of the prompts; Iād bet real cash theyāre a flavor of:
ābut wont it break prod how can i tellā
āi don want yiu to modify it yet make a backupā
āwhy did you do it????? undo undoā
āread the fileā¦later i will ask you questionsā
Every single story I see has the same issues.
Theyāre token prediction models trying to predict the next word based on a context window full of structured code and a 13 year old girl texting her boyfriend. I really thought people understood what ālanguage modelsā are really doing, at least at a very high level, and would know to structure their prompts based on the style of the training content they want the LLM to emulate.
Friendly reminder: most cloud providers have deletion locks. Go and enable them on your prod dbs right now.
Sure, Claude could just remove the lock - but it's one more gate.
Edit: these existed long before agents, and for good reason: mistakes happen. Last week I removed tf destroy from a GitHub workflow, because it was 16px away from apply in the dropdown. Lock your dbs, irrespective of your take on agents.
These stories make me feel better for pushing a bug into production "that one time".
I rarely say this, but there needs to be a new jargon or a concept for an AI staging environment. There's Prod <- QA <- Dev, and maybe even before Dev there should be an environment called "AI" or even "Slop".
This is rage bait
> Teaching engineers to build production AI systems | AI agents, LLMs, ML, data engineering |
> In the newsletter, I wrote the full timeline + what I changed so this doesn't happen again.
> If you found this post helpful, follow me for more content like this.
So yeah, this is standard LinkedIn/X influencer slop.
Looks like he never had any idea what he was doing in the first place. Vibe bro classic.
No replicas?
"I gave an automation I didn't control permissions it shouldn't have"
Funny how when Claude Code makes something people take credit.
Play stupid games, win stupid prizes.
The more you fuck around, the more you find out.
No staging environment?
No prior attempt to follow best practices (e.g. deletion protection in production)? Nor manual gating of production changes?
No attempt to review Claude's actions before performing them?
No management of Terraform state file?
No offline backups?
And to top it off, Claude (the supposed expert tool) didn't repeatedly output "Are you insane? No, I'm not working on that." - Clearly Claude wasn't particularly expert else, like any principal engineer, it would've refused and suggested sensible steps first.
(If you, dear reader of this comment, are going to defend Claude, first you need to declare whether you view it as just another development tool, or as a replacement for engineers. If the former, then yeah, this is user error and I agree with you - tools have limits and Claude isn't as good as the hyped-up claims - clearly it failed to output the obvious gating questions. If the latter, then you cannot defend Claude's failure to act like a senior engineer in this situation.)