I wrote this the other day:
> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.
— https://www.threads.com/@jimdabell/post/DLek0rbSmEM
I guess it’s true for product features as well.
The music notation tool space is balkanized in a variety of ways. One of the key splits is between standard music notation and tablature, which is used for guitar and a few other instruments. People are generally on one side or another, and the notation is not even fully compatible - tablature covers information that standard notation doesn't, and vice versa. This covers fingering, articulations, "step on fuzz pedal now," that sort of thing.
The users are different, the music that is notated is different, and for the most part if you are on one side, you don't feel the need to cross over. Multiple efforts have been made (MusicXML, etc.) to unify these two worlds into a superset of information. But the camps are still different.
So what ChatGPT did is actually very interesting. It hallucinated a world in which tab readers would want to use Soundslice. But, largely, my guess is they probably don't....today. In a future world, they might? Especially if Soundslice then enables additional features that make tab readers get more out of the result.
I think folks have taken the wrong lesson from this.
It’s not that they added a new feature because there was demand.
They added a new feature because technology hallucinated a feature that didn’t exist.
The savior of tech, generative AI, was telling folks a feature existed that didn’t exist.
That’s what the headline is, and in a sane world the folks that run ChatGPT would be falling over themselves to be sure it didn’t happen again, because next time it might not be so benign as it was this time.
This is called product-channel fit. It's great the writer recognized how to capture the demand from a new acquisition channel.
This is an interesting example of an AI system effecting a change in the physical world.
Some people express concerns about AGI creating swarms of robots to conquer the earth and make humans do its bidding. I think market forces are a much more straightforward tool that AI systems will use to shape the world.
What this immediately makes me realize is how many people are currently trying ot figure out how to intentionally get AI chat bots to send people to their site, like ChatGPT was sending people to this guy's site. SEO for AI. There will be billions in it.
I know nothing about this. I imagine people are already working on it, wonder what they've figured out.
(Alternatively, in the future can I pay OpenAI to get ChatGPT to be more likely to recommend my product than my competitors?)
Anyone who has worked at a B2B startup with a rouge sales team won't be surprised at all by quickly pivoting the backlog in response to a hallucinated missing feature.
I find it amusing that it's easier to ship a new feature than to get OpenAI to patch ChatGPT to stop pretending that feature exists (not sure how they would even do that, beyond blocking all mentions of SoundSlice entirely.)
> ChatGPT was outright lying to people. And making us look bad in the process, setting false expectations about our service.
I find it interesting that any user would attribute this issue to Soundslice. As a user, I would be annoyed that GPT is lying and wouldn't think twice about Soundslice looking bad in the process
We (others at company, not me) hit this problem, and not with chatgpt but with our own AI chatbot that was doing RAG on our docs. It was occasionally hallucinating a flag that didn't exist. So it was considered as product feedback. Maybe that exact flag wasn't needed, but something was missing and so the LLM hallucinated what it saw as an intuitive option.
I had a smaller version of this when coding on a flight (with no WiFi! The horror!) over the Pacific. Llama hallucinated array-element operations and list-comprehension in C#. I liked the shape of the code otherwise, so, since I was using custom classes, I just went ahead and implemented both features.
I also went back to just sleeping on those flights and using connected models for most of my code generation needs.
I'm having the same problem (and had a rant about it on X a few weeks ago [1]).
We get ~50% of traffic from ChatGPT now, unfortunately a large amount of the features it says we have are made up.
I really don't want to get into a state of ChatGPT-Driven-Development as I imagine that will be never ending!
I've come across something related when building the indexing tool for my vintage ad archive using OpenAI vision. No matter how I tried to prompt engineer the entity extraction into the defined structure I was looking for, OpenAI simply has its own ideas. Some of those ideas are actually good! For example it was extracting celebrity names, I hadn't thought of that. For other things, it would simply not follow my instructions. So I decided to just mostly match what it chooses to give me. And I have a secondary mapping on my end to get to the final structure.
Here's the thing: I don't think ChatGPT per se was the impetus to develop this new feature. The impetus was learning that your customers desire it. ChatGPT is operating as the kind of "market research" tool here, albeit it in a really unusual, inverted way. That said, if someone could develop a market research tool that worked this way, i.e. users went to it instead of you have to use it to go to users, I can see it making quite a packet.
People forget that while technology grows, society also grows to support that.
I already strongly suspect that LLMs are just going to magnify the dominance of python as LLMs can remove the most friction from its use. Then will come the second order effects where libraries are explicitly written to be LLM friendly, further removing friction.
LLMs write code best in python -> python gets used more -> python gets optimized for LLMs -> LLMs write code best in python
A significant number of new signups at my tiny niche SaaS now come from ChatGPT, yet I have no idea what prompts people are using to get it to recommend my product. I can’t get it to recommend my product when trying some obvious prompts on my own, on other people’s accounts (though it does work on my account because it sees my chat history of course).
Pretty good example of how a super-intelligent AI can control human behavior, even if it doesn't "escape" its data center or controllers.
If the super-intelligent AI understands human incentives and is in control of a very popular service, it can subtly influence people to its agenda by using the power of mass usage. Like how a search engine can influence a population's view of an issue by changing the rankings of news sources that it prefers.
There are a few things which could be done in the case of a situation like that:
1. I might consider a thing like that like any other feature request. If not already added to the feature request tracker, it could be done. It might be accepted or rejected, or more discussion may be wanted, and/or other changes made, etc, like any other feature request.
2. I might add a FAQ entry to specify that it does not have such a feature, and that ChatGPT is wrong. This does not necessarily mean that it will not be added in future, if there is a good reason to do so. If there is a good reason to not include it, this will be mentioned, too. It might also be mentioned other programs that can be used instead if this one doesn't work.
Also note that in the article, the second ChatGPT screenshot has a note on the bottom saying that ChatGPT can make mistakes (which, in this case, it does). Their program might also be made to detect ChatGPT screenshots and to display a special error message in that case.
Along these lines, a useful tool might be a BDD framework like Cucumber that instead of relying on written scenarios has an LLM try to "use" your UX or API a significant number of times, with some randomization, in order to expose user behavior that you (or an LLM) wouldn't have thought of when writing unit tests.
"A Latent Space Outside of Time"
> Correct feature almost exists
> Creator profile: analytical, perceptive, responsive;
> Feature within product scope, creator ability
> Induce demand
> await "That doesn't work" => "Thanks!"
> update memory
More than once GPT-3.5 'hallucinated' an essential and logical function in an API that by all reason should have existed, but for whatever reason had not been included (yet).
I have fun asking Chatbots how to clear the chat and seeing how many refer to non-existent buttons or menu options
Paving the folkways!
Figuring out the paths that users (or LLMs) actually want to take—not based on your original design or model of what paths they should want, but based on the paths that they actually do want and do trod down. Aka, meeting demand.
The comments are kind of concerning. First, ChatGPT did not discover unmet demand in the market. It tried to predict what a user would want and hallucinated a feature that could meet that demand. Both the demand and the feature were hallucinations. Big problem.
The user is not going to understand this. The user may not even need that feature at all to accomplish whatever it is they're doing. Alternatives may exist. The consequences will be severe if companies don't take this seriously.
Been using LLMs to code a bit lately. It's decent with boilerplate. It's pretty good at working out patterns[1]. It does like to ping pong on some edits though - edit this way, no back that way, no this way again. I did have one build an entire iOS app, it made changes to the UI exactly as I described, and it populated sample data for all the different bits and bobs. But it did an abysmal job at organizing the bits and bobs. Need running time for each of the audio files in a list? Guess we need to add a dictionary mapping the audio file ID to length! (For the super juniors out there: this piece of data should be attached to whatever represents the individual audio file, typically a class or struct named 'AudioFile'.)
It really likes to cogitate on code from several versions ago. And it often insists repeatedly on edits unrelated to the current task.
I feel like I'm spending more time educating the LLM. If I can resist the urge to lean on the LLM beyond its capabilities, I think I can be productive with it. If I'm going to stop teaching the thing, the least it can do is monitor my changes and not try to make suggestions from the first draft of code from five days ago, alas ...
1 - e.g. a 500-line text file representing values that will be converted to enums, with varying adherence to some naming scheme - I start typing, and after correcting the first two, it suggests the next few. I accept its suggestions until it makes a mistake because the data changed, start manual edits again ... I repeated this process for about 30 lines and it successfully learned how I wanted the remainder of the file edited.
Adding a feature because ChatGPT incorrectly thinks it exists is essentially design by committee—except this committee is neither your users nor shareholders.
On the other hand, adding a feature because you believe it is a feature your product should have, a feature that fits your vision and strategy, is a pretty sound approach that works regardless of what made you think of that feature in the first place.
TDD meets LLM-driven API design.
I recall that early on a coworker was saying that ChatGPT hallucinated a simpler API than the one we offered, albeit with some easy to fix errors and extra assumptions that could've been nicer defaults in the API. I'm not sure if this ever got implemented though, as he was from a different team.
That's the most promising solution to AI hallucinations. If LLM output doesn't match the reality, fix the reality
We've added formant shifting to Graillon https://www.auburnsounds.com/products/Graillon.html largely because LLMs thought it already had formant-shifting.
i LOVE this despite feeling for the impacted devs and service. love me some good guitar tabs, and honestly id totally beleive the chatgpt here hah..
what a wonderful incident / bug report my god.
totally sorry for the trouble and amazing find and fix honestly.
sorry i am more amazed than sorry :D. thanks for sharing this !!
It's worth noting that behind this hallucination there were real people with ASCII tabs in need of a solution. If the result is a product-led growth channel at some scale, that's a big roadmap green light for me!
In addition, we might consider writing the scientific papers ChatGPT hallucinates!
Oh. This happened to me when asking a LLM about a database server feature. It enthusiastically hallucinated that they have it when the correct answer was 'no dice'.
Maybe I'll turn it into a feature request then ...
I wonder if we ever get to the point I remember reading about in a novel ( AI initially based on emails ), where human population is gently nudged towards individuals that in aggregate benefit AI goals.
If nothing else, I at least get vindication from hallucinations. "Yes, I agree, ChatGPT, that (OpenSSL manpage / ffmpeg flag / Python string function) should exist."
Amazing story.
Had something similar happen to us with our dev-tools saas. Non devs started coming to the product because gpt told them about it. Had to change parts of the onboarding and integration to accommodate it for non-devs who were having a harder time reading the documentation and understanding what to do.
Chatbot advertising has to be one of the most powerful forms of marketing yet. People are basically all the way through the sales pipeline when they land on your page.
So this model of ChatGPT obviously has been trained with the July 2028 dataset by mistake, including this discussion.
It'll all be fine in a few years. :-;
This reminds me how the software integraters or implementers worked a couple of decades back. They are IT contractors for implementing a popular software product such as IBM MQ or SAP etc at a client site and maintaining it. They sometimes incorrectly claim that some feature exists, and after finding that it doesn't exist, they create a ticket to the software vendor asking for it as a patch release.
Funny this article is trending today because I had a similar thought over the weekend - if I'm in Ruby and the LLM hallucinates a tool call...why not metaprogram it on the fly and then invoke it?
If that's too scary, the failed tool call could trigger another AI to go draft up a PR with that proposed tool, since hey, it's cheap and might be useful.
What made ChatGPT think that this feature is supported? And a follow up question - is that the direction SEO is going to take?
hallucination driven development
Can this sheet-music scanner also expand works so they don't contain loops, essentially removing all repeat-signs?
Wow! What if we all did this? What is the closure of the feature set that ChatGPT can imagine for your product. Is it one that is easy for ChatGPT to use? Is it one that is sound and complete for your use cases? Is it the best that you can build had you had clear requirements upfront?
Well, I also learned that the developers of this tool are looking at the images their users upload.
I think this is the best way to build features. Build something that people want! If people didn't want it ChatGPT won't recommend it. You got a free ride on the back of a multibillion dollar monster - i can't see what's wrong about that.
Beyond the blog: Going to be an interesting world where these kinds of suggestions become paid results and nobody has a hope of discovering your competitive service exists. At least in that world you'd hope the advertiser actually has the feature already!
If there is a strong demand for a feature, regardless of the source of the request - good enough reason to add it.
ChatGPT wasn't wrong, it was early. It always knew you would deploy it.
"Would you still have added this feature if ChatGPT hadn't bullied you into it?" Absolutely not.
I feel like this resolves several longstanding time travel paradox tropes.
AI is of, for, and by vibe coders who don't care about the details.
How fast is that new feature growing? Is it a killer feature?
The problem with LLMs is that in 99% of cases, they work fine, but in 1% of cases, they can be a huge liability, like sending people to wrong domains or, worse, phishing domains.
Oh my, people complaining about getting free traffic from ChatGPT... While most businesses are worried about all their inbound traffic drying up as search engine use declines.
Pretty goofy but I wonder if LLM code editors could start tallying which methods are hallucinated most often by library. A bad LSP setup would create a lot of noise though.
Wouldn't some GEO tool like AthenaHQ help with this?
Is this going to be the new wave of improving AI accuracy? Making the incorrect answers correct? I guess it’s one way of achieving AGI.
Loved this article. If you can adapt to the market (even if the AI did that) you can provide your users a greater experience.
slightly off topic: but on the topic of AI coding agents making up apis and features that don’t exist, I’ve had good success with Q telling it to “check the sources to make sure the apis actually exist”. sometimes it will even request to read/decompile (java) sources, and do grep and find commands to find out what methods the api actually contains
Time travelling AGI confirmed, to the moon!
Sometimes you plan for features that aren’t actually there. I found using mailsAI helped me focus on what’s really available, which made managing expectations easier. It’s a simple way to keep things clear.
> Should we really be developing features in response to misinformation?
Creating the feature means it's no longer misinformation.
The bigger issue isn't that ChatGPT produces misinformation - it's that it takes less effort to update reality to match ChatGPT than it takes to update ChatGPT to match reality. Expect to see even more of this as we match toward accepting ChatGPT's reality over other sources.
> We’ve got a steady stream of new users [and a fun blog post]
Neat
> My feelings on this are conflicted
Doubt
Is this the first AI hallucinated desire path?
And LLMs started to tell pepl what to do :DDD.
Will you use ChatGPT to implement the feature?
What the hell, we elect world leaders based on misinformation, why not add s/w features for the same reason?
In our new post truth, anti-realism reality, pounding one's head against a brick wall is often instructive in the way the brain damage actually produces great results!
Forget prompt engineering, how do you make ChatGPT do this for anything you want added to your project that you have no control over? Lol
So now the machines ask for features and you're the one implementing them. How the turns have tabled...
That's a riot!
ChatGPT routinely hallucinates API calls. ChatGPT flat-out makes it from whole cloth. "Apple Intelligence" creates variants of existing API calls, Usually, by adding nonexistent arguments.
Both of them will hallucinate API calls that are frequently added by programmers through extensions.
That's a very constructive way of responding to AI being hot trash.
You're now officially working for the machine, congrats.
love this
It's the new form of giving into the customer, lol
Why would anyone think this is a bad thing as the article hints?
"We’ve got a steady stream of new users" and it seems like a simple feature to implement.
This is the exact chaos AI brings that's wonderful. Forcing us to evolve in ways we didn't think of.
I can think of a dozen reasons why this might be bad, but I see no reason why they have more weight than the positive here.
Take the positive side of this unknown and run with it.
We have decades more of AI coming up, Debbie Downers will be left behind in the ditch.
"Should we really be developing features in response to misinformation?"
No, because you'll be held responsible for the misinformation being accurate: users will say it is YOUR fault when they learn stuff wrong.
This feels like a dangerously slippery slope. Once you start building features based on ChatGPT hallucinations, where do you draw the line? What happens when you build the endpoint in response to the hallucination, and then the LLM starts hallucinating new params / headers for the new endpoint?
- Do you keep bolting on new updates to match these hallucinations, potentially breaking existing behavior?
- Or do you resign yourself to following whatever spec the AI gods invent next?
- And what if different LLMs hallucinate conflicting behavior for the same endpoint?
I don’t have a great solution, but a few options come to mind:
1. Implement the hallucinated endpoint and return a 200 OK or 202 Accepted, but include an X-Warning header like "X-Warning: The endpoint you used was built in response to ChatGPT hallucinations. Always double-check an LLM's advice on building against 3rd-party APIs with the API docs themselves. Refer to https://api.example.com/docs for our docs. We reserve the right to change our approach to building against LLM hallucinations in the future." Most consumers won’t notice the header, but it’s a low-friction way to correct false assumptions while still supporting the request.
2. Fail loudly: Respond with 404 Not Found or 501 Not Implemented, and include a JSON body explaining that the endpoint never existed and may have been incorrectly inferred by an LLM. This is less friendly but more likely to get the developer’s attention.
Normally I'd say that good API versioning would prevent this, but it feels like that all goes out the window unless an LLM user thinks to double-check what the LLM tells them against actual docs. And if that had happened, it seems like they wouldn't have built against a hallucinated endpoint in the first place.
It’s frustrating that teams now have to reshape their product roadmap around misinformation from language models. It feels like there’s real potential here for long-term erosion of product boundaries and spec integrity.
EDIT: for the down-voters, if you've got actual qualms with the technical aspects of the above, I'd love to hear them and am open to learning if / how I'm wrong. I want to be a better engineer!
True anti-luddite behavior
> We ended up deciding: what the heck, we might as well meet the market demand.
this is my general philosophy and, in my case, this is why I deploy things on blockchains
so many people keep wondering about whether there will ever be some mythical unfalsifiable to define “mainstream” use case, and ignoring that crypto natives just … exist. and have problems they will pay (a lot) to solve.
to the author’s burning question about whether any other company has done this. I would say yes. I’ve discovered services recommended by ChatGPT and other LLMs that didnt do what was described of them, and they subsequently tweaked it once they figured out there was new demand
If you build on LLMs you can have unknown features. I was going to add an automatic translation feature to my natural language network scanner at http://www.securday.com but apparently using the ChatGPT 4.1 does automatic translation so I didn’t have to add it.
[flagged]
I've found this to be one of the most useful ways to use (at least) GPT-4 for programming. Instead of telling it how an API works, I make it guess, maybe starting with some example code to which a feature needs to be added. Sometimes it comes up with a better approach than I had thought of. Then I change the API so that its code works.
Conversely, I sometimes present it with some existing code and ask it what it does. If it gets it wrong, that's a good sign my API is confusing, and how.
These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.
(The best thing about this is that I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code, which often takes longer than just writing the code the usual way.)
There are multiple ways that an interface can be bad, and being unintuitive is the only one that this will fix. It could also be inherently inefficient or unreliable, for example, or lack composability. The AI won't help with those. But it can make sure your API is guessable and understandable, and that's very valuable.
Unfortunately, this only works with APIs that aren't already super popular.