Pentagon’s outgoing AI chief warns Congress on safety, accuracy risks of the emerging tech

Outgoing Pentagon AI chief Craig Martell makes a point at a March 22, 2024 hearing of the House Armed Services Committee. He's flanked by Chief Information Officer John Sherman (L) and Air Force Lt. Gen. Robert J. Skinner, director of the Defense Information Systems Agency.

Outgoing Pentagon AI chief Craig Martell makes a point at a March 22, 2024 hearing of the House Armed Services Committee. He's flanked by Chief Information Officer John Sherman (L) and Air Force Lt. Gen. Robert J. Skinner, director of the Defense Information Systems Agency. Mostafa Bassim/Anadolu via Getty Images

Craig Martell warned that ensuring the accuracy and value of large language models will be the “biggest charge” for his successor.

The outgoing chief digital and artificial intelligence officer at the Department of Defense warned lawmakers Friday that making generative AI systems safe for use in military operations will require significant work to standardize input data and validate outputs.

“We've been working really hard to figure out where and when generative AI is gonna be useful and where and when it's going to be dangerous,” he testified before the House Armed Services Subcommittee on Cyber, Information Technologies and Innovation. “The danger is…it takes a very high cognitive load to validate the output of this model.” 

Martell said that preventing errors made by such large language models requires validating their outputs –– that is, controlling for hallucinations and incorrect responses. Being able to easily verify the output of a given model is accurate is also a key characteristic of AI softwares that will be safe for business usage.

“Hallucination hasn't gone away yet. There's lots of hope the hallucination will go away. There's some research that says it won't ever go away,” he said. “But most importantly, if it's a difficult to validate output, then I'm very uncomfortable with it.”

Quality data is a critical first step in leveraging a generative LLM responsibly and effectively according to Martell, who said that data labeling and data transformation underpin both helping U.S. troops leverage AI solutions and further scaling technologies for agency operations.. These are two ingredients forming what Martell describes as “AI scaffolding,” a larger strategy to effectively scale AI for defense use cases.

“Knowing what you want to detect means humans have to label that data. That's very difficult,” he said. “The data that has to come into your problem is going to be in a thousand different formats, the amount of work to transform a PDF, Word doc, et cetera, into structured data is pretty massive, and that'll take up — for the folks on the edge — all of their time.”

Even after building and training a given AI model, Martell said an AI solution for operations will need to be continually tested and retrained as scenarios change. This will likely be a paramount challenge for his incoming successor, Radha Plumb.

“I wish we had gotten further on that in the two years I've been here, but that's going to be my biggest charge for Dr. Plumb: is really focused on how do we model and measure the value of models over time,” he said.