Iām sure they were trained on clickbait articles and PR releases from universities, which routinely do the same and misinterpret and overstate the importance of the science.
I am shocked, I tell you Iām flabbergasted that LLMs which cannot truly reason go down the wrong rabbit-hole nearly every time. Iāve spent a good chunk of the weekend trying to accelerate the development of a small SaaS solution using Cursor, CoPilot, etc even with the latest and greatest Claude Sonnet 4, paying for āMaxā, etc and any meaningfully sized request, winds up being a completely frustrating experience of trying to get these tools to stay on the rails. Iām to the point where I will give these fuckers explicit instructions to come up with several hypotheses, potential solutions, and then not to generate code before getting my go-ahead and it still more often than not go ahead and start editing code without my approval/direction. As a bonus it will forget a few things we corrected earlier. Canāt wait for the first vibe-coder to get sued when thereās a massive breach or financial loss. These tools are not ready for prime time and they certainly arenāt worthy of the billions being spent on them.
āYou asked her what color a house was and she said, āItās white on this side.āā
āThatās right.ā
āShe didnāt assume that the other side was white, too⦠and a Fair Witness wouldnāt.ā
-- Stranger in a Strange Land (1961)
An LLM is an abstraction machine, it mashes together anything that is nearby in a high dimensional space. Its statistical model is its source of truth. For a Fair Witness AI reasoning needs to supplant statistics. Which I'm guessing can get weird fast. LLMs are really good at being suggestible. For this we need the opposite."Over a year, we collected 4,900 summaries. When we analysed them, we found that six of ten models systematically exaggerated claims they found in the original texts"
So it turns out llms trained largely on Internet science articles make the same mistakes as are made by science journalists.
LLM being trained on a corpus of pitiful science reporting from the mainstream press... This is exactly what you would expect.
Studies like this should make it evident that LLMs are not reasoning at all. An AI that would reason like humans would also make mistakes like humans and by now we can all see that LLM mistakes are completely random and nonsensical.
It is also unclear that the current rate of progress is in the direction that would solve this issue. I think generative AI for images and video will get better, but the reasoning capabilities seem to be in a different domain.
Most leading chatbots routinely exaggerate science findings - ahh yes this is completely unique to llms
Online forums for difficult medical conditions are full of ChatGPT copy-and-paste responses right now. LLMs are the tool of choice for people who want answers and want them now. Much like this article claims, a popular usage is to prompt the LLM to get a more optimistic interpretation about a specific supplement or treatment.
This is a really difficult problem because these people are often very sick and not getting the answers they want from their doctors. Previously this void was filled by alternative medicine doctors and quacks selling supplements. Now ChatGPT has arrived and has convinced a lot of them that they have a super-human AI at their fingertips that can be massaged to produce any answer they want to hear.
Itās painful to try to read some of these forums where threads have turned into endless āhereās what ChatGPT saysā pasted walls of text, followed by someone else trying to counter with a different ChatGPT wall of text.
This isnāt unique to LLMs. There is a huge market for grossly exaggerating the conclusions of scientific studies. Podcasters like Huberman and Dr. Rhonda Patrick are famous for taking obscure studies with questionable conclusions and extrapolating to āprotocolsā or supplement stacks for their fans to follow. I often get downvoted when I mention fan-favorite podcasters by name, but I think by now many listeners have caught on to the way they exaggerate small studies into exciting listening material.
Also, most leading news outlets routinely exaggerate science findings
I wonder if there is any connection between the models producing exaggerated outputs and the litany of exaggerated or overconfident claims that academic media offices or the press have produced from previous studies. Maybe the models trained on the studies and the reports on the studies naturally tend toward the style of attention seeking reports even when directly provided with the studies.