Training versions of AI models on classified data is expected to be more accurate and efficient at certain tasks, according to a US defense official who spoke on his background. MIT Technology Review. The news comes at a time when demand for more powerful models is high: The Pentagon has contracts with OpenAI and Elon Musk’s XAI to run their models in classified settings, and is implementing a new one. Agenda Becoming an “AI-first” combat force as the conflict with Iran escalates. (The Pentagon did not comment on its AI training plans by the time of publication.)
The training will be conducted in a secure data center accredited to host classified government projects, and where a copy of an AI model is linked to classified data, according to two people familiar with how such operations work. Although the Defense Department will own the data, officials from AI companies with proper security clearances can hardly ever access the data, the official said.
Before allowing this new training, though, the official said the Pentagon first plans to evaluate how accurate and effective the models are when trained on unclassified data such as commercially available satellite imagery.
The military has long used computer vision models, an older form of AI, to identify objects in images and footage it collects from drones and aircraft, and federal agencies have Awarded contracts Companies to train AI models on such content. And large language models (LLMs) and AI companies that make chatbots have created versions of their models that are suitable for government tasks, such as Anthropic’s Claude Gov, designed to work in more languages ​​and secure environments. But the official’s comments are the first indication that AI companies building LLMs, such as OpenAI and xAI, could train government-specific versions of their models directly on classified data.
Alok Mehta, who directs the Wadhwani AI Center at the Center for Strategic and International Studies and previously led AI policy efforts at Google and OpenAI, says training on classified data, as opposed to just answering questions about it, will present new risks.