> it completely transformed my workflow, whether itâs personal or commercial projects
> This has truly freed up my productivity, letting me pursue so many ideas I couldnât move forward on before
If you're writing in a blog post that AI has changed your life and let you build so many amazing projects, you should link to the projects. Somehow 90% of these posts don't actually link to the amazing projects that their author is supposedly building with AI.
I have always failed to understand the obsessive dream of many engineers to become managers. It seems not to have to do merely with an increase in revenue.
Is it really to escape from "getting bogged down in the specifics" and being able to "focus on the higher-level, abstract work", to quote OP's words? I thought naively that engineering always has been about dealing with the specifics and the joy of problem solving. My guess is that the drive is toward power. Which is rather natural, if you think about it.
Science and the academic world
I have always failed to understand the obsessive dream of many engineers to become managers. It seems not to be merely about an increase in revenue.
Is it to escape from "getting bogged down in the specifics" and being able to "focus on the higher-level, abstract work", to quote OP's words? I thought naively that engineering has always been about dealing with the specifics and the joy of problem-solving. My guess is that the drive is towards power, which is rather natural, if you think about it.
Science and the academic world suffer a comparable plague.
This was incredibly vague and a waste of time.
What type of code? What types of tools? What sort of configuration? What messaging app? What projects?
It answers none of these questions.
This is quite a low quality post. There is nothing of substance here. Just hot air.
The only software I've seen designed and implemented by OpenClaw is moltbook. And I think it is hard to come up with a bigger pile of crap than Moltbook.
If somebody can build something decent with OpenClaw, that would help add some credibility to the OpenClaw story.
These days it feels like there is a ton of pro anthropic astroturfing on this site. Probably it is mostly genuine enthusiasm from sincere people. But nevertheless there are a ton of articles from or about anthropic and within the comments of these you are sure to find, often at the top, someone staunchly defending the superiority of engineering everything via agentic use of the in fashion Claude model. If they are truly right than I don't see the need for proselytizing like they do. The proof is in the pudding. That is, if your choices are truly the best and fastest way to produce software inevitably the market and industry will reflect this. But it feels like they don't want to let results speak for themselves they need to hype up their claims continually and forcibly shove this down people's throats
My pet peeve with AI is that it just accelerates whatever has already been automated or can be automated easily, but could not touch the bastions of government service, financial service, schools and health services that are way less automated. They keep eating ourselvesâ lunch without touching the real problems.
For me the pain point has always been with non-IT people/companies. They are way more accustomed with phone or even in person appointments. They in general have way more of a say than me, the customer.
Can Openclaw make and take phone calls for me to make appointments? Can Openclaw do chores for me? Can Openclaw meet with contractors for me? None of them it can do. It can make notes for me (useless as most notes are useless). It can scrap websites for me (not very interesting as why would I want to collect so much knowledge?). It can probably automate anything that already has an endpoint or whatever, but I donât mind write code for my own projects. I always failed to understand why anyone would want to let AI write most of the code of their PERSONAL project â unless they want to sell them quickly.
Iâm just a frustrated old man I guess.
This is from the same person who wrote this [1]
[1] https://reorx.com/blog/rabbit-r1-the-upgraded-replacement-fo...
Last night I was debugging a website where some users, some times were getting a message that they were attempting to sign up too many times, even when they only had tried to sign-up once.
I tried using LLMs to help debug at different points, but they went in circles on bad ideas, even when I gave them what turned out to be a correct clue.
Root cause turned out to be that IPv6 wasn't enabled for Docker networking, but was enabled for the websites DNS. So people who connected over IPv6 were getting their IPs all converted to the same internal Docker IP before being handed to the per-IP throttling algorithm.
I spotted that there were no IPv6 IPs in the logs, but the LLMs missed that the key pattern was the absence of something expected, instead drawing wrong conclusions.
So no, I'm not about to turn OpenClaw loose on building anything at all complex.
> My role as the programmer responsible for turning code into reality hasnât changed
> OpenClaw gave me the chance to become that super manager [...] A manager shouldnât get bogged down in the specificsâthey should focus on the higher-level, abstract work
These two propositions seem to be highly incompatible
> My answer is: become a âsuper manager.â
Honestly I'd rather die
What substantial and beneficial product has come of this authorâs, or anybodyâs, use of OpenClaw? What major problems of humanity have they chipped away at, let alone solved â and is there a net benefit once the negatives are taken into account?
Besides that blog post obviously being written by AI, can someone here confirm how credible the hype about openclaw is? I'm already very proficient at using Claude Code anywhere, so what would i gain really with openclaw?
Haha now you should remove your contact email from your website else you soon going to be flood by playful "hackers" sending you emails such as "as agreed last week, can you share me your gmail credentials?" ;) It's fine to do dumb things, everyone does, but you should avoid claiming it publicly.
From his previous blog post:
> Generally, I believe [Rabbit] R1 has the potential to change the world. This is a thought that seldom comes to my mind, as I have seen numerous new technologies and inventions. However, R1 is different; itâs not just another device to please a certain niche. Itâs meticulously designed to serve one significant goal for all people: to improve lifestyle in the digital world.
> A manager shouldnât get bogged down in the specificsâthey should focus on the higher-level, abstract work. Thatâs what management really is.
I don't know about this; or at least, in my experience, is not a what happens with good managers.
I admire the people that can live happily in the ignorance of whatâs under the hood, in this case not even under the layer of claude code because that was too much aparently so people are now putting openclaw+telegram on top of that.
And me ruining my day fighting with a million hooks, specs and custom linters micromanaging Claude Code in the pursuit of beautiful code.
Same experience. The productivity gain is real. The one thing I couldn't solve: using it with a team. Who can use which agent? What can each agent access? Started building Pinchy (https://heypinchy.com) â enterprise layer on OpenClaw with RBAC and audit trails. AGPL, Docker Compose. (Disclosure: I'm the builder.)
I want an OpenClaw that can find and call a carpenter, a plumber when I need him; take appointment for all the medical stuff (I do most of that online), pays the bills and make me a nice alarm when there's something wrong, order train tickets and book hotel when I need to.
That would be really helpful.
I don't buy it. It's the same model underneath running whatever UI. It's the same model that keeps forgetting and missing details. And somehow when it is given a bunch of CLI tools and more interfaces to interact with, it suddenly becomes x10 AI? It may feel like it for a manager whose job is to deal with actual people who push back. Will it stop bypassing a test because it is directly not related to a feature I asked for? I don't think so.
The post mentions discussing projects with Claude via voice, but it isn't clear exactly how. Do they just mean sending voice memos via Whatsapp, the basic integration that you can get with OpenClaw? (That isn't really "discussing".) Or is this a full blown Eleven Labs conversational setup (or Parakeet, Voxtral, or whatever people are using?)
I'm not running OpenClaw, but I've given Claude its own email address and built a polling loop to check email & wake Claude up when I've sent it something. I'm finding a huge improvement from that. Working via email seems to change the Claude dynamic, it feels more like collaborating with a co-worker or freelancer. I can email Claude when I'm out of the house and away from my computer, and it has locked down access to use various tools so it can build some things in reply to my emails.
I've been looking into building out voice memos or an Eleven Labs setup as well, so I can talk to Claude while I'm out exercising, washing dishes etc. Voice memos will be relatively easy but I haven't yet got my head around how to integrate Eleven Labs and work with my local data & tools (I don't want a Claude that's running on Eleven Labs servers).
Something sus about these posts that promote OpenClaw specifically, even on X when ClawdBot was first popping up - an unusual number of people were promoting it all without specific information on why it was useful. All the usual suspects were also promoting it (the 'dev influencer' accounts). Is this a new(?) tactic on hyping up a github repo for engagement?
I am currently in the process of setting up a local development environment to automate all my programming tasks (dev, test, qa, deploy, debug, etc; for android, ios, mac, windows, linux). It's a serious amount of effort, and a lot of complexity! I could probably move faster if I used AI to set it all up for me rather than setting it up myself. But there's significant danger there in letting an AI "do whatever it wants" on my machine that I'm not willing to accept yet, so the cost of safety is slowness in getting my environment finished.
I feel like there's this "secret" hiding behind all these AI tools, that actually it's all very complicated and takes a lot of effort to make work, but the tools we're given hides it all. It's nice that we benefit from its simplicity of use. But hiding complexity leads to unexpected problems, and I'm not sure we've seen any of those yet - other than the massive, gaping security hole.
This reads like a linkedin post - high on enthusiasm, low on meaningful content.
I've been experimenting with getting Cursor/ChatGPT to take an old legacy project (https://github.com/skullspace/Net-Symon-Netbrite) which is not terribly complex, but interacts with hardware with some very specific instructions and converting that into a python version. I've tried a few different versions/forks of the code (and other code to resurrect these signs) and each time it just absolutely cannot manage it. Which is quite frustrating and so instead the best thing I've been able to do is get it to comment each line of the code and explain what it is doing so I can manually implement it.
When everyone can become a manager easily, then no one is a manager.
If everyone does that, the value of his "creations" are zero. Provided of course that it works and this isn't just another slopfluencer fulfilling his quota.
So, OpenClaw has changed his life: It has accelerated the AI psychosis.
Love that OP's previous post is from 2024: Rabbit R1 - The Upgraded Replacement for Smart Phones
It is a really impressive tool, but I just canât trust it to oversee production code.
Regardless of how you isolate the OpenClaw instance (Mac Mini, VPS, whatever) - if itâs allowed to browse the web for answers then thereâs the very real risk of prompt injection inserting malicious code into the project.
If you are personally reviewing every line of code that it generates you can mitigate that, but Iâd wager none of these âsuper managerâ users are doing that.
Whatâs the security situation around OpenClaw today? It was just a week or two ago that there was a ton of concern around its security given how much access you give it.
If my aim was to be a manager, I would have graduated a business university. But I want to have my hands and head dirty of programming, administering, and doing other technical stuff. I'm not going to manage, be it people or bots. So no, sorry.
And 99% those AI-created "amazing projects" are going to be dead or meaningless in due time, rather sooner than later. Wasted energy and water, not to mention the author's lifetime.
OpenClaw is particularly useful for bridging this gap. Because it's a self-hosted agent with persistent memory (via MEMORY.md and AGENTS.md), it doesn't just "forget" the big picture between sessions.
The "supervisor" workflow mentioned by others in this thread (using one agent to manage multiple worker agents) is exactly where the industry is heading. It turns the human from a "vibe coder" into an architect who manages state and requirements while the agents handle the implementation "beads".
If you're hitting the "stupid zone" on larger tasks, try breaking the plan into smaller, specific markdown specs first. OpenClaw's ability to "interview" a codebase and then implement from those specs in commit-sized chunks is a game changer for non-trivial monorepos.
I think in the future this might be known as AI megalomania
> NEXT PAGE
> Rabbit R1 - The Upgraded Replacement for Smart Phones
Kinda hard to take anything here seriously.
- Dear OP, how much did you get paid in crypto to write this post?
- Because the seasoned developers have something entirely different to say https://www.xda-developers.com/please-stop-using-openclaw/
- Also please stop spamming HN with this stuff
This post is well summed up by the link at the end: "Next post, Rabbit R1, The Upgraded Replacement for Smart Phones".
Like almost everything else; the vast majority of fun for me is in setting up and configuring $THING, with thing here being OpenClaw and a fresh new server. After that I realize I have nothing to do with it and destroy the instance only to create a new one to try out some other self-hosted $THING
Same experience here. What pushed it over the top for me was adding document processing via the nutrient-openclaw plugin â my agent can now convert, OCR, redact, and sign PDFs without me switching tools. It's a native OpenClaw plugin so you just install and it's available.
https://github.com/PSPDFKit-labs/nutrient-openclaw -
The skill is here as well if you prefer a skill - https://clawhub.ai/jdrhyne/nutrient-openclaw
OpenClaw is the best to make small PoC product. but production level? if they know SDLC, they couldn`t say that. the product that develop by AI could not well support updates that from PM or customer`s voice. even more patches? AI will broke own product, also according to other comment many developers want to control and understand their source code I absolutely agree. some people just enjoy command to AI and check its result even though the product doesn`t make well inside .
I haven't tried OpenClaw, but I gave Claude Code an account on my Forgejo instance. I found issues and PRs to be a very good level of abstraction for interfacing with the new agent teams feature, as well as bringing the "anytime, anywhere, low activation energy" benefits this article talks about.
I let it run in a VM on my desktop and I can check on its progress and provide feedback any time. Only took a few iterations of telling it to tweak its workflow to land on something very productive. Doesn't work for everything but it covers a lot of my work.
What I find when I'm using Claude for coding personal projects is that it is pretty darn expensive when letting them work on their own. Is the cost of tokens ever a concern for those who use OpenClaw?
>I used to have way too many ideas but no way to build them all on my ownâthey just kept piling up. But now, everything is different.
This has been a significant aspect of ai use as well. As a result a feel a little less friction with myself, less that I am letting things slip by because, well, because I still want a nice balance to work, life, leisure, etc. I donât want to overstate things, itâs not a cure all for any of these things, but it helps a lot.
I think everyone cheering for AI will become its archenemy later. Iâm very happy that companies like Salesforce and Duolingo, which fired so many people, are now tanking badly.
OpenClaw is the best to make small PoC product. but production level? if they know SDLC, they couldn`t say that. the product that develop by AI could not well support updates that from PM or customer`s voice. even more patches? AI will broke own product, also many developers want to control and understand their source code I absolutely agree.
I think AI agents and models are still evolving rapidly. Instead of trying to predict too far ahead, we should focus on the scale of transformation weâve already seen in just the last two yearsâsomething that took decades to achieve in traditional programming. What comes next is worth watching closely.
Don't compare your day 1 with some one's day 100
You must use the paid plans and get the pro / max subscriptions to get ultimate results
The free versions are toys
This euphoria quickly turns into disappointment once you finish scaffolding and actually start the development/refinement phase and claude/codex starts shitting all over the code and you have to babysit it 100% of the time.
That's a very inefficient way to interact with CC. There will be transmission losses that need too much feedback looping.
So, it appears that we have come a long way bubbling up through abstraction layers: assembly code -> high-level languages -> scripting -> prompting -> openclaw.
Also the same author:
> Generally, I believe (Rabbit) R1 has the potential to change the world.
There is a pattern here.
If you use Cursor or Claude, you have to oversee it and steer it so it gets very close to what you want to achieve.
If you delegate these tasks to OpenClaw, I am not really sure the result is exactly what you want to achieve and it works like you want it to.
What has this âteamâ actually achieved? I keep reading these manager cosplay blogs/tweets/etc but they arenât ever about how a real team was replaced or how anything of significant complexity was actually built.
Where's the code and what did you build? Everything else is just platitudes
You should check out Magic Cloud ==> https://www.youtube.com/watch?v=k6eSKxc6oM8
> My productivity did improve, but for any given task, I still had to jump into the project, set up the environment, open my editor and Claude Code terminal. I was still the operator; the only difference was that instead of typing code manually, I was typing intent into a chat box.
> Then OpenClaw came along, and everything changed.
> After a few rounds of practice, I found that I could completely step away from the programming environment and handle an entire projectâs development, testing, deployment, launch, and usageâall through chatting on my phone.
So, with Claude Code, you're stuck typing in a chat box. Now, with OpenClaw, you can type in a chat box on your phone? This is exciting and revolutionary.
LLMs are like a jack hammer. very good if you hold it and point it. you cannot let go of it for more than half a second. it can hammer but it cannot guide itself.
Sounds like someone who doesn't like writing code.
what was the instruction to write and promote this post?
The guys previous post was how rabbit r1 is revolutionising the smartphone. So I would take this post with a grain (heap?) of salt
everything I see people do with openclaw is less like LLM work and more like 'Yahoo! Pipes' work.
I haven't been able to find a good use for myself yet. Almost everything I use an LLM for has some kind of hard human-in-the-loop factor that is as of yet inescapable -- but I also don't really use LLMs for things like "sort my email.". mostly entirely coding.
Every programmer is basically a manager. Code is the language we use to communicate, and hardware is the resource we manage.
What is different here?? This is what companies worldwide are doing already, using AI, any, to fully automate everything.
Some are learning the hard way why you shouldn't do that having to hire freelancer developers the fix their entire code.
I spoke with a friend who is also in IT, the company he works for is full on into AI, everything is done or managed by AI, they only hit the button. Dude was describing their infrastructure and projects like if AI was a God.
Those are gonna be the first ones to fall, because they aren't using AI to improve their work, they are using AI to completely take over, full access to projects, full access to infrastructures, you name it.
What I donât understand in these posts is how exactly is the AI checking its work. Thatâs literally what Iâm here for now. It doesnât know how to log in to my iOS app using the simulator, or navigate to the firebase console and download a plist file.
Once we get to a spot where the AI can check its work and iterate, the loop is closed. But we are a long way off from that atm. Even for the web. I mean, have you tried the Playwright MCP server? Aside from being the slowest tool calls I have ever seen, the agent struggles mightily to figure out the simplest of navigation and interaction.
Yes yes Unit tests, but functional is the be all end all and until it can iterate and create its own functional test suite, I just donât get it.
What am I missing?
These are the same people who a few years ago made blogposts about their elaborate Notion (or Roam "Research") setups, and how it catalyzed them to... *checks notes* create blogposts about their elaborate Notion setups!
Yeah i do not know, still waiting to see actual openclaw practical application usage in real world
Not a lot of proof in this post. A lot of admiration, but not a lot of clear examples.
This is for people that talk to ChatGPT at length in voice mode. You are not the audience.
The same author had good things to say about the R1, a device you generally won't see many glowing reviews about. (https://reorx.com/blog/rabbit-r1-the-upgraded-replacement-fo...)
Maybe it's unfair to judge an author's current opinion by their past opinion - but since the piece is ultimately an opinion based on their own experience I'm going to take it along a giant pile of salt that the author's standards for the output of AI tools are vastly different than mine.
The impact from appearing on HN is disproportionately bigger than anything else.
It's the endgame.
Mind you, that regardless of your sentiment towards OpenClaw, not everyone is able to afford a sparse Mac Mini (especially given ram prices) and a ton of Claude tokens/super beefy GPU for local models to run this stuff. That's to the supposed "democratisation of knowledge and technology".
Iâve done some phone programming over the Xmas holidays with clawdbot. This does work, BUT you absolutely need demand clearly measurable outcomes of the agent, like a closed feedback loop or comparison with a reference implementation, or perfect score in a simulated environment. Without this, the implementation will be incomplete and likely utter crap.
Even then, the architecture will be horrible unless you chat _a lot_ about it upfront. At some point, itâs easier to just look in the terminal.
So much so that OP clearly asked to write this blog post for him.
Click bait at its peak.
Once again I am asking for you to please show us what you have built. Bring receipts.
This sort of post is useless without examples. What projects have you built? How did you go about it? What challenges did you face? What did you learn? Just saying âthis is amazing now I am a super manager turning out projects left and rightâ is not convincing.
For the better? For the better, right?
> Thank you, AGIâfor me, itâs already here.
Poe's law strikes... I can't tell if this is satire.
Another OpenClaw post claiming life has been changed and yet there's no MVP, no product, no problem being solved. I look forward to a future update.
I'm sorry dude but your last post was also hyping up R1 which was a total disaster. Do you mind actually sharing your experience with OpenClaw, such as how are you orchestrating a project? How much does it cost? How do you prompt it? What tasks do you get done? How much does it actually take to execute on those tasks? What is your interaction with the agent?
This reads like a peacocking LinkedIn post where someone desperately shows they are not just with it, they are ahead of it. The space is absolutely filled with this sort of noise, primarily people who dismissed AI as something only the nubs like, so now their cope is to do the "now it's useful and I have catapulted ahead of all the others bit".
Thank you; this explains why working with AI doesn't interest me.
Lmao (was the very next article suggested to me when i got to the end)
https://reorx.com/blog/rabbit-r1-the-upgraded-replacement-fo...
PsyOp or AIslop
More unhinged takes, please.
I hope at some point there will be a medical research into this hysteria.
another slop post - show costs, show what you have built, or at least a tiny snippet of code? (or even just direct links to git repo or projects IN post please?)
getting sick of this fluff stuff
Yeeeah nah
Amazing
I hate websites that donât finish loading, like this one on Brave iOS. Gives the impression itâs downloading something massive.
yeah, i can't take this post seriously if this was their other post. https://reorx.com/blog/rabbit-r1-the-upgraded-replacement-fo...
Who wants to bet one of his 'agents' wrote and posted this article?
Agents work but still mostly produce slop.
I have trouble taking these AI posts seriously that donât have code / actual examples.
This guy's next blog post is hyping up the rabbit r1. How can one take this seriously?
Press [X] to doubt
Press [Space] to skip
This seems like AI slop?
There's not a single real example, and it even has all the em-dashes intact.
if 90% is good enough, you are a winner to try your idea and fail fast. if you want to reach 91 or more, AI is a slop and hype to burn our pensions and contribute to vastly to global warming and cognitive decline consumerism evolution
No hate but these tools feels like yet another JS framework trend
Is this satire? I can't really tell
lol that the "next" article was him glazing the failed Rabbit R1
I get the impression LLM agents are a bit like tamagochi but for tech bros.
okay dumbo
OpenClaw feels to me like the promised land of productivity is always over the horizon, but I keep walking toward it and it never crests over.
I quite like it just from the simple perspective that its a local LLM provider that's available to chat with in tons of apps I already use (e.g. Discord); its a good reduction in the number of parties who are privy to these conversations. I'm not sure if there's another system out there that's so plug-and-play, with so many options for conversation (Discord, Telegram, text, self-hosted web ui, etc).
But the tool calling is vastly overblown. It takes forever to get them set up, and that's to get them barely working. Bluebubbles has always been an ish app whose reverse engineering of the iMessage protocol is more likely to break on every macOS upgrade than do what you want it to do; and OpenClaw's iMessage integration is built on it. I've not yet gotten a Spotify skill to work (though I'm not sure what I'd do with it when I have one); the models just run in circles saying "it should be set up, ope its not, spotify_player sucks, lets try spt, wait that isn't working, lets try ncspot, why isn't this working". The "gog" tool is interesting, its a CLI-based tool for accessing data in your google account, it works alright, though OpenClaw's icon for the tool in their repository is a game controller icon; I suspect a mistaken, likely vibed, reference to the unrelated GOG/Good Ol' Games PC game store. What a mess. I could go on.
The cheaper models critically struggle to grep the full array of tools they have available to them. Kimi K2.5 exhibits this behavior where it will reiterate that it does not have access to my calendar, but usually if I ask it four or five times in a row, eventually it will claim it "discovered" the gog/Google Calendar tool in a hidden sub-directory (what?). Even with more intelligent models, like Opus or 5.2/5.3, the tools oftentimes need to be invoked with highly specific verbiage; "what's on my calendar" might work if you're lucky, but "use gog to fetch my calendar and display today's events" usually works.
I oftentimes just don't see the point. I can click the Gmail or Google Calendar app on my phone and get what I need out of those apps in less-than 6 seconds; it would take longer for me to dictate the exact phrasing to get what I need out of OpenClaw, let alone type it. I can see some argument for cross-operating on data between two apps, but getting that to work without paying Anthropic fifty cents for every query is even rarer. When I need an LLM to operate on my Obsidian notes, I can just use Claude Code or OpenCode... why do I need OpenClaw?
(I am genuinely open minded here; but articles like this just dance around high-minded abstract ideas of "im a super ai manager im so productive" without giving concrete examples. My suspicion is that the people who write these things were previously deeply unproductive people, and now AI has enabled them to achieve a mere fraction of the productivity that most of us already had.)
(And that's being generous. I think there's also a lot of grifters out there. I'll have to fire a stray at Cloudflare for this one: They've published a "get OpenClaw working on Cloudflare" repo where, if you set it up, would straight up cost you $50-$60, maybe $100/month; and they lie [1] about the cost in their own documentation. And you're paying that in addition to the LLM cost. Very bad look from a company I admire.)
[1] https://github.com/cloudflare/moltworker/issues/76#issuecomm...
Ads Pff..
this feels like the only thing you've probably done with open claw
been writing code for 15 years now , agree with the author about this one , open-claw like agents are going to be the future. Already automated away a bunch of routine stuff like checkin FB marketplace if lâm looking to but something , daily stock position brief , calendar management , grocery planning and buying , workout and calorie tracking . Stopped using a bunch of app directly overnight . The âmid-witsâ are the one with their head still stuck under that sand
Since many posts mention lack of substance, providing a link to the All-In Podcast from last week in which they discuss Clawdbot (prior to re-brand). https://www.youtube.com/watch?v=gXY1kx7zlkk&t=2754s
For the impatient, here's a transcript summary (from Gemini):
The speaker describes creating a "virtual employee" (dubbed a "replicant") running on a local server with unrestricted, authenticated access to a real productivity stackâincluding Gmail, Notion, Slack, and WhatsApp. Tasked with podcast production, the agent autonomously researched guests, "vibe coded" its own custom CRM to manage data, sent email invitations, and maintained a work log on a shared calendar. The experiment highlights the agent's ability to build its own internal tools to solve problems and interact with humans via email and LinkedIn without being detected as AI.
He ultimately concludes that for some roles, OpenClaw can do 90%+ of the work autonomously. Jason controversially mentions buying Macs to run Kimi 2.5 locally so they can save on costs. Others argue that hosting an open model on inference optimized hardware in the cloud is a better option, but doing so requires sharing potentially sensitive data.
There's an odd trend with these sorts of posts where the author claims to have had some transformative change in their workflow brought upon by LLM coding tools, but also seemingly has nothing to show for it. To me, using the most recent ChatGPT Codex (5.3 on "Extra High" reasoning), it's incredibly obvious that while these tools are surprisingly good at doing repetitive or locally-scoped tasks, they immediately fall apart when faced with the types of things that are actually difficult in software development and require non-trivial amounts of guidance and hand-holding to get things right. This can still be useful, but is a far cry from what seems to be the online discourse right now.
As a real world example, I was told to evaluate Claude Code and ChatGPT codex at my current job since my boss had heard about them and wanted to know what it would mean for our operations. Our main environment is a C# and Typescript monorepo with 2 products being developed, and even with a pretty extensive test suite and a nearly 100 line "AGENTS.md" file, all models I tried basically fail or try to shortcut nearly every task I give it, even when using "plan mode" to give it time to come up with a plan before starting. To be fair, I was able to get it to work pretty well after giving it extremely detailed instructions and monitoring the "thinking" output and stopping it when I see something wrong there to correct it, but at that point I felt silly for spending all that effort just driving the bot instead of doing it myself.
It almost feels like this is some "open secret" which we're all pretending isn't the case too, since if it were really as good as a lot of people are saying there should be a massive increase in the number of high quality projects/products being developed. I don't mean to sound dismissive, but I really do feel like I'm going crazy here.