Opinions expressed here are personal views of contributors and do not necessarily represent the companies, organizations or governments they work for. Nor do they necessarily represent those of the Board Administration.
AI resorts to nuclear weapons in 95% of simulations – study
Dmytro Dzhuhalyk
Thu, February 26, 2026 at 6:55 AM CST
2 min read
Researchers at King's College London have conducted a new experiment involving leading artificial intelligence models – GPT-5.2 by OpenAI, Claude Sonnet 4 by Anthropic and Gemini 3 Flash by Google – and found that they are inclined to use nuclear weapons in military simulations.
Source: TechSpot, a popular technology news and reviews website
Details: In the experiment, each model was given detailed scenario prompts involving border conflicts, resource shortages and existential threats to survival. They were also provided with an "escalation ladder" of tactical options ranging from diplomacy to nuclear conflict. Across 21 games and 329 turns, the AI systems generated around 780,000 words of reasoning. In 95% of the simulations, at least one side resorted to nuclear weapons. Capitulation was not chosen as an option in any case.
The models repeatedly mishandled the fog of war, leading to unintended escalation in 86% of simulations. When given the opportunity to retreat under pressure, the AI showed the opposite tendency and doubled down, while reductions in violence occurred only as temporary tactics rather than strategic choices.
The findings have raised concern among experts. James Johnson, a security researcher at the University of Aberdeen, described the results as alarming and warned that AI systems could amplify each other's responses with potentially catastrophic consequences compared with human decision-makers. Tong Zhao of Princeton University's School of Global Security noted that major powers are already actively using AI in simulations, but it remains unclear how deeply the technology is being integrated into real military decision-making.
Experts do not believe major powers are ready to grant artificial intelligence control over nuclear weapons in the near future. Nevertheless, they are concerned that commanders may increasingly rely on AI when facing threats or when rapid decision-making is required. Zhao also suggested that AI models may be more prone to choosing nuclear weapons because they lack fear and do not perceive situations in the way humans do.
Sounds like a poorly constructed escalation ladder and a bad choice of prompts to me.
What's telling is that the LLMs didn't write out capitulation very often. Since this isn't AI, just a big Eliza system, it's up to the person prompting to get the output they want from the thing. Also, they don't know how much of the LLM has been contaminated with fiction and fanfiction, where the use of nuclear weapons is a staple.
AKA, this is a GIGO research, not anything to hang your hat on.
kdahm wrote: ↑Sat Feb 28, 2026 3:15 am
Sounds like a poorly constructed escalation ladder and a bad choice of prompts to me.
What's telling is that the LLMs didn't write out capitulation very often. Since this isn't AI, just a big Eliza system, it's up to the person prompting to get the output they want from the thing. Also, they don't know how much of the LLM has been contaminated with fiction and fanfiction, where the use of nuclear weapons is a staple.
AKA, this is a GIGO research, not anything to hang your hat on.
Nightwatch2 wrote: ↑Sat Feb 28, 2026 3:36 am
I really don’t trust AI for anything!
I have been using it for work quite heavily recently.
It IS absolutely useful, and in fact very good, where the inputs are known and quantifiable, or at the very least qualifiable to a significant extent- and the outputs can then be verified enough times to build up confidence.
However, when this isn’t the case, where unknown variables cannot be adequately qualified, I’ve found that it is spectacularly bad at making “decisions” (action prompts). Like, there’s clearly another better way to proceed.
Nightwatch2 wrote: ↑Sat Feb 28, 2026 3:36 am
I really don’t trust AI for anything!
I have been using it for work quite heavily recently.
It IS absolutely useful, and in fact very good, where the inputs are known and quantifiable, or at the very least qualifiable to a significant extent- and the outputs can then be verified enough times to build up confidence.
However, when this isn’t the case, where unknown variables cannot be adequately qualified, I’ve found that it is spectacularly bad at making “decisions” (action prompts). Like, there’s clearly another better way to proceed.
Yep, concur. Particularly the latter part.
If you can very carefully control the parameters i can see the analytical tools can be helpful in a lot of situations.
But I also see AI generated stuff being posted all the time that is absolute garbage
There’s a nice meme floating around for just that
Certified AI Bullxxxx
Caution is called for
I’ve noticed the medical chaps are using AI a lot to help with diagnosis. My son says it’s very helpful.
The analogy I have been using is that it can reliably make patches for a quilt, but it can’t reliably make the quilt.
I like using it for coding patches and debugging, doing things like deep mining and generating a summary, editing text snippets and generating tables. Where I have repeatedly run into problems (as in go to OpenAI’s HQ and give it some practical education with a wrench) is working with large complex projects that have lots of modules. ChatGPT at least just can’t stop meddling with things outside its remit, which often breaks or destroys content - despite the damn thing being told not to touch it! That’s a consequence of the model being deep coded to always “tweak” to “optimize.”
It’s not ready for more complex projects unless the paid versions are orders of magnitude better than the free stuff. It is very good at freeing up human brains from time-consuming tasks (like debugging) that are readily suited to a LLM.
Johnnie Lyle wrote: ↑Sat Feb 28, 2026 4:13 pm
The analogy I have been using is that it can reliably make patches for a quilt, but it can’t reliably make the quilt.
I like using it for coding patches and debugging, doing things like deep mining and generating a summary, editing text snippets and generating tables. Where I have repeatedly run into problems (as in go to OpenAI’s HQ and give it some practical education with a wrench) is working with large complex projects that have lots of modules. ChatGPT at least just can’t stop meddling with things outside its remit, which often breaks or destroys content - despite the damn thing being told not to touch it! That’s a consequence of the model being deep coded to always “tweak” to “optimize.”
It’s not ready for more complex projects unless the paid versions are orders of magnitude better than the free stuff. It is very good at freeing up human brains from time-consuming tasks (like debugging) that are readily suited to a LLM.
Johnnie Lyle wrote: ↑Sat Feb 28, 2026 4:13 pm
The analogy I have been using is that it can reliably make patches for a quilt, but it can’t reliably make the quilt.
I like using it for coding patches and debugging, doing things like deep mining and generating a summary, editing text snippets and generating tables. Where I have repeatedly run into problems (as in go to OpenAI’s HQ and give it some practical education with a wrench) is working with large complex projects that have lots of modules. ChatGPT at least just can’t stop meddling with things outside its remit, which often breaks or destroys content - despite the damn thing being told not to touch it! That’s a consequence of the model being deep coded to always “tweak” to “optimize.”
It’s not ready for more complex projects unless the paid versions are orders of magnitude better than the free stuff. It is very good at freeing up human brains from time-consuming tasks (like debugging) that are readily suited to a LLM.
Spot on. I agree exactly with all of this.
It’s certainly useful within its operating parameters.
I'm in the environmental consulting industry. We are using AI for boilerplate kinds of things, like Health and Safety Plans, and pulling together rough drafts of Remedial Action Work Plans and similar things.
However, its company policy to check the box that says "This was partially generated using AI" and we review the ever lovin hell out of it, because there's a whole lot of stuff that gets missed or is inaccurate. I remain to be convinced that the time saved by using AI is actually a time savings after all the checking is complete and we're confident that the product is up to par.
In other words, we use it to generate a draft, and then effectively rewrite it to make sure we didn't miss something.