• Top
  • New

Anthropic's SHADE-Arena: Evaluating sabotage and monitoring in LLM agents

by thoughtpeddleron 6/17/2025, 8:01 AMwith 0 comments

0