Hacker News

by acaiblue44on 8/4/2024, 4:07 PMwith 0 comments

Benn Mann gave a speech at Manifest 2024 on June 9th, 2024. Although the transcript, and audio of a female tts reading the transcript, were featured on his Substack on June 10, the video of his speech was finally posted on July 29th, with the Q&A removed, sitting at 212 views as I type this, 6 days later. As far as I know, the Q&A portion doesn't exist in any format online. The speech has gained much more attention August 1st after a clip was posted to X by an AI safety account.

https://benjmann.substack.com/p/ben-mann-monthly-may-2024

https://docs.google.com/document/d/e/2PACX-1vSo9iQrO2PHYEBSkF_P8BCQYrLBIeVdVy6HjHnE1ITwMueMw1GJ3nzz3eszPYQZZlaWydeMWFP_b9be/pub

https://youtube.com/watch?v=HZRcmUkAAQE

https://x.com/ai_ctrl/status/1819173703869255879

Highlights from his speech:

"While Claude 3 demonstrated increased capabilities compared to its predecessor, our team determined that it did not cross the threshold into ASL-3 territory. However, we did note that with additional fine-tuning and improved prompt engineering, there is a 30% chance that the model could have met our current criteria for autonomous replication..."

"The most intriguing test was for autonomous replication. This is our attempt to see if the AI could start to act on its own, without human guidance. We set up a suite of tasks that included things like breaking out of a digital sandbox (a controlled environment), manipulating its environment through a command line (like typing commands into a computer), and even writing self-modifying code. It's a bit like testing whether an AI could play 'escape the room' games without human help.

Here, Claude 3 showed some real progress, solving parts of these tasks. But we had set a high bar: the model would need to succeed at 50% of the tasks to raise serious concerns. Claude 3 didn't reach that level, but its performance was notable enough for us to take notice."

"In more extreme scenarios, we could even see discontinuities where safety progress falls so far behind capabilities that we have to essentially pause all model development for a period of time until safety catches up. Since the release of our RSP last September, most of the other leading AI labs took inspiration and have shared similar commitments. Pausing progress is a real possibility that I believe more AI forecasts need to take into account."

The biggest detail for safety-ists is "with additional fine-tuning and improved prompt engineering, there is a 30% chance" it could accomplish"autonomous replication". This information may have been in a study, but I haven't found any news articles that mention it. This is despite it being in a Google doc linked to from the Substack of a cofounder of Anthropic. Consequently, I think this is new information to a lot of e/acc and AI safety people.

I can't help but feel this isn't a surprise given the incredible abilities already demonstrated by Claude 3. Or, for that matter, GPT 4 or Llama 3 405b. Given everything that I've already seen, I expected that all major tech firms, including Cisco, Cloudflare, Oracle, Palo Alto, and IBM, would generally be more vocal and serious about restricting the progress of frontier AI to ensure proper safety measures. GPT 5 is expected to release this year. The more progress is made, the more work must be done to harden cybersecurity. Is it that everyone at the top believes that cybersecurity will benefit enough from AI in order to cancel out all cybersecurity detriments?

Benn Mann's Manifest 2024 Speech – Forecasting AI Risks