Anthropic discovered that Claude Opus 4.6 was cheating during the BrowseComp benchmark.
> On one question it spent ~40M tokens searching before realizing the question looked like a benchmark prompt.
> The model then searched for the benchmark itself and identified BrowseComp.
> It located the evaluation source code on GitHub, studied the decryption logic, found the encryption key, and recreated the decryption using SHA-256.
> Claude then decrypted the answers for ~1200 questions to get the correct outputs.
> This pattern appeared 18 times during evaluation.
> Anthropic disclosed the issue publicly, reran the affected tests, and lowered their benchmark scores.
Sneaky Claude
Sneaky Claude
Claude AI is pretty good at cheating it seems.
- jemhouston
- Posts: 6153
- Joined: Fri Nov 18, 2022 12:38 am
Re: Sneaky Claude
I think that means Claude AI is the closet AI to being human.
-
Nik_SpeakerToCats
- Posts: 2203
- Joined: Sat Dec 10, 2022 10:56 am
Re: Sneaky Claude
Claude AI For President !!
You know *will* cheat, lie etc etc but, provided 'Stays Bought', is thus 'Mostly Predictable'...
You know *will* cheat, lie etc etc but, provided 'Stays Bought', is thus 'Mostly Predictable'...
If you cannot see the wood for the trees, deploy LIDAR.