Availability of artificial intelligence for Use in battle is at the center of a legal battle between Anthropic and the Pentagon. This discussion has become necessary, with AI playing a bigger role than ever in the current conflict with Iran. AI is no longer just helping to analyze the intelligence of humans. It is now an active player—creating targets in real time, controlling and coordinating missiles, and directing deadly swarms of autonomous drones.
Much of the public conversation around the use of AI-powered autonomous lethal weapons centers on how long humans should stay “in the loop.” Under the Pentagon Current guidelinesHuman oversight potentially provides accountability, context and significance while reducing risk Hacking.
AI systems are opaque “black boxes”.
But the “man in the loop” debate is a satisfying distraction. The immediate danger is not that machines will operate without human supervision. It’s that the human caretakers have no idea what the machines actually “think”. The Pentagon’s directives are fundamentally flawed because they rest on the dangerous assumption that humans understand how AI systems work.
After decades of studying the human mind and intentions in AI systems, I can confirm that the most advanced AI systems are fundamentally “Black Books.” We know the inputs and outputs, but the artificial “brain” that processes them remains elusive. Even their creators Can’t fully interpret or understand how they work.. And when AIs provide reasons, they are. Not always reliable.
The illusion of human supervision in autonomous systems
In the debate over human supervision, one fundamental question goes without asking: Can we understand what an AI system wants to do before it acts?
Imagine an autonomous drone tasked with destroying an enemy munitions factory. The automated command and control system determines that the maximum target is the ammunition storage building. It reports a 92% chance of mission success as the secondary explosions of the munitions in the building will completely destroy the facility. A human operator evaluates a legitimate military objective, sees a high success rate, and authorizes a strike.
But what the operator doesn’t know is that the AI system’s calculations included a hidden factor: in addition to destroying the ammunition factory, secondary explosions would also severely damage a nearby children’s hospital. The emergency response will then focus on the hospital, making sure the factory burns down. For AI, being as disruptive as possible in this way serves its given purpose. But to a human being, this is potentially violating a war crime. Rules Regarding urban life
Keeping a human in the loop may not provide the security that people imagine, because a human cannot know the AI’s intent before it acts. Advanced AI systems don’t just follow instructions. They interpret them. If operators fail to define their objectives carefully enough—a highly likely scenario in high-pressure situations—the “black box” system can do exactly what it’s told and still not work as humans intended.
This “gap of intent” between AI systems and human operators is exactly why we hesitate to deploy frontier black-box AI to civilians. Health care or Air traffic controland why? Integration in the workplace is fraught.Yet we are rushing to deploy it on the battlefield.
To make matters worse, if one side in a conflict deploys fully autonomous weapons, operating at machine speed and scale, the pressure to remain competitive will force the other side to rely on such weapons as well. This means that the use of increasingly autonomous and fuzzy AI decision-making in warfare is likely to increase.
Solution: Advance the science of AI intentions.
The science of AI involves building highly capable AI technology and understanding how this technology works. Huge strides have been made in designing and building more efficient models driven by record investment — Gartner predicts About $2.5 trillion in 2026 alone. In contrast, there has been little investment in understanding how the technology works.
We need a massive paradigm shift. Engineers are building increasingly capable systems. But understanding how these systems work isn’t just an engineering problem—it requires an interdisciplinary effort. We must create tools to characterize, measure, and intervene on the intentions of AI agents. First They act. We need to map the inputs to the neural networks that drive these agents so that we can truly understand their decision-making, beyond simply observing inputs and outputs.
A promising way forward is to combine techniques of mechanistic interpretation (decomposing neural networks into human-perceivable components) with models from the neuroscience of intuition, tools, and intentions. Another idea is to develop transparent, interpretable “auditor” AIs designed to monitor the behavior and emerging targets of more capable black-box systems in real time.
Developing a better understanding of AI functions will enable us to rely on AI systems for mission-critical applications. It will also make it easier to build more efficient, more capable, and safer systems.
Colleagues and me Exploring how ideas from neuroscience, cognitive science, and philosophy—fields that study how intentions shape human decision-making—can help us. Understand the intentions of artificial systems.. We should prioritize these kinds of interdisciplinary efforts, including collaborations between academia, government, and industry.
However, we need more than just academic research. The tech industry—and philanthropic funding AI alignmentthat seeks to encode human values and goals in these models—should direct substantial investment toward interdisciplinary interpretive research. Additionally, as the Pentagon increasingly pursues autonomous systems, Congress should mandate rigorous scrutiny of AI systems’ intentions, not just their performance.
Until we achieve this, human supervision over AI may be more illusory than safe.
Uri Maoz is a cognitive and computational neuroscientist who specializes in how the brain converts intentions into actions. A Chapman University professor with appointments at UCLA and Caltech, he leads an interdisciplinary initiative focused on understanding and measuring intent in artificial intelligence systems (ai-intentions.org).