Hacker News

by mileson 6/20/2025, 9:28 PMwith 1 comments

by chiph2oon 6/20/2025, 10:03 PM

in-context scheming = alignment red flag

More capability + low clarity on intent = low trust

More capable models are better at in-context scheming