The Commander’s AI Smartcard
Artificial Intelligence is Commanders’ Business
by John Anderson, Marc Losito, Sean Batir
Artificial Intelligence (AI) is changing warfare and the herculean advancements of private industry are now impacting the warfighter on the ground. The operationalization of AI will only continue to advance throughout the Joint Force, but a lack of domain expertise incurs risk in the commander’s decision-making cycle. In order to properly integrate AI across the force, commanders require a framework for understanding the risks and challenges of AI integration—use case, data, compute, algorithm development, test/evaluation, and deployment.
The possibilities of AI are so intoxicating and the buzz so loud that commanders may easily be fooled with demoware. When our excitement for new technologies is at a fever pitch, it’s time for commanders to provide a sober, thoughtful assessment similar to other weapons systems. Real AI that truly increases the lethality of the force is at our doorstep, but it’s not easy and anyone who pitches quick AI-wins is likely pitching snake oil. This Commander’s AI Smartcard provides a necessary rubric for commanders and senior staffs to evaluate various AI capabilities and quickly ascertain the real from the fake.
In January 1801, Eli Whitney, inventor of the cotton gin, held court with President John Adams, Vice President Thomas Jefferson, and senior military leaders to demonstrate the power of interchangeable parts for muzzle-loaded rifles. Whitney’s miracle musket was a glimpse into the dawn of the machine age and promised to revolutionize the character of warfare to the United States’ advantage. The lethal promise of Whitney’s musket bears a striking resemblance to the hope of artificial intelligence (AI) and machine learning (ML), but what can Eli Whitney’s musket saga teach us about the promise of AI on the modern battlefield?
Prior to Whitney’s miracle musket, firearms production was a highly skilled endeavor requiring the labor of a trained, experienced craftsman. Each musket was custom built with brittle parts; each was almost like a piece of art. These custom weapons were expensive and unpredictable. Commanders and soldiers had no sense of how effective the weapons would fire—straight, sideways, not at all—or how many would fail in battle. If the weapon broke on the battlefield, there was no easy way to fix it without a skilled craftsman's hands. If a weapon catastrophically malfunctioned, the soldier was out of the fight until a weapon could be taken from a fallen comrade or vanquished foe.
Whitney’s new musket with interchangeable parts promised to change all of this. Weapons with interchangeable parts could usher in an age of mass production—higher volume, lower costs, and more predictable performance. The lethality of the Joint Force would no longer be held at the mercy of artisans, and commanders would know a single standard for quality, increasing accuracy and confidence. This was certain to increase the lethality of the Joint Force dramatically. The potential was intoxicating.
The demonstration, however, belied this promise of lethality and U.S. advantage. Whitney hadn’t actually produced a miracle musket with interchangeable parts; he simply created the illusion that he had. Whitney used secretly pre-marked pieces to assemble locks and screwed them into different muskets, ostensibly demonstrating “interoperability.” In modern parlance, Whitney produced “demoware,” or a demonstration designed for the purpose of a demonstration.
The audience was none the wiser. Between Adams, Jefferson, and any number of military leaders in the audience, not a single one of them knew the right questions to ask Mr. Whitney. They fell for the ruse. They were betrayed by a desire for innovation, next-generation weaponry, and a pressing need for new technologies to help a nascent U.S. military prepare for war. Alas, it was all a farce to win a richly priced government contract, and it worked.
Similar to Mr. Whitney’s unsuspecting audience in 1801, do modern-day commanders understand the underpinnings of AI well-enough to spot AI hucksters and charlatans? With the advent of any new military technology comes a steep learning curve, whether a musket or AI, required to make informed life and death decisions.
This is why you, commanders at all levels touching AI, need an AI Smartcard. Just as Adams, Jefferson, and their military commanders needed a certain level of domain expertise to question Mr. Whitney, so too do commanders need a resource to guide engagements with innumerable AI providers. A syllabus to help commanders demystify AI; distinguish real, enduring capabilities from “drive-by AI”; understand the strengths, limitations, and key performance attributes of AI systems; and help commanders make risk and investment decisions on AI that actually increase the lethality of the Joint Force. The Commander’s AI Smartcard provides a rubric to evaluate AI capabilities, provide key inputs into development, and spot an AI huckster early on in the process. Understanding these subject areas puts commanders on a path to realize the promise of AI, lethality at scale against our nation’s adversaries.
AI is as ethereal today as interchangeable musket locks were in the late 18th century. Better yet, AI is a promise of a more lethal force, at scale, similar to a miracle musket. Just as AI has already changed the nature of competition in private industry, AI is indeed changing the character and fabric of combat. It will soon chew through military formations. AI portends to allow commanders at all levels to churn through an insurmountable torrent of data to make decisions at speed and scale. In the future, AI promises to remove humans from the loop altogether, empowering machines with the commander’s intent. He who first ascends the commanding heights of AI will have an almost unassailable competitive advantage.
Because AI directly impacts a commander’s purchase of risk and decision-making cycle, AI must not be developed in isolation from commanders as if we’re buying beans and bullets. Commanders must play a critical role in developing narrow and general AI systems that enhance and enable a unit’s ability to make better decisions. Traditional weapon system development is a multi-year process of requirements, development, testing, and eventually fielding to commanders. The AI smartcard aids commanders and their staff to ask informed questions during the development process and prioritize their investments accordingly.
The smartcard is structured into six main categories, corresponding to the core functionality comprising an AI system. This is not meant to be sacrosanct; it will need to be adjusted as formations respond to AI and new use cases are established. At the very least, it should provoke a good discussion with vendors, government agencies, and other vital stakeholders purporting to hold the crown jewels of your AI weapon system. The more commanders experiment with advanced AI applications, the better the smartcard will become.
The use case should drive AI development. Until we reach the point of general AI, we’re going to be living in the realm of acutely defined use cases, likely based on accelerating or automating existing workflows. Commanders and senior staff should spend a lot of time identifying pain points and outlining workflows to identify specific AI applications that would be most useful. An operations officer (J3) once remarked, “You guys are focused on developing flying cars when I’m spending days pulling together reports from email, folders, chats, and other trapped data. Can you work on that?” Too often we focus on the neon lights of AI, the promise of a dream, without stopping to think through how it can truly enhance the lethality of the Joint Force. The commander should drive this discussion, not the vendor or service provider. Start with AI applications specific to warfighting functions, then put it all together to cut across these artificial boundaries.
Data is the coin of the realm. Nothing happens without cleaned, normalized, structured, labeled data pumping into the AI platform in real-time, for the majority of supervised, deep learning approaches. Commanders often assume we’re sitting on a treasure trove of data based on the operational tempo of U.S. forces deployed in nearly constant combat operations over the past two decades. It's true, we have a lot of data, but it's usually trapped and inaccessible for AI applications. The key differentiator is structured, labeled data sets. The quantity of labels is a key indicator of the model's quality. It is not the only indicator and models have been shown to be performant without as many labels as other models, but commanders should know the quantity of labels utilized to train the model. Additionally, data ontology is a critical component of data; commanders need a common ontology so different models aren’t optimized for different ontologies. This would be the equivalent of every formation on the battlefield using a different version of a 9-line MEDEVAC, each with its own specific language. Separately, if data is absolutely critical to an AI application's performance, then we must agree that protecting the data is vital. If the service provider is storing the data on their own networks, then commanders should understand how secure this data is and how the vendors intend to protect the data in the future. Just as adversaries develop high payoff and priority target lists to cripple our lethality, so too will they attempt to pollute our AI with false data.
AI requires a LOT of compute power. One reason we see revolution after revolution in commercial applications for AI is related to discoveries around new ways to leverage compute power, including the power of Graphical Processing Units (GPUs) to train and deploy models. A few years ago, experts could ask vendors a straightforward question, “how many GPUs do you have,” as a litmus test to determine whether or not they actually have a real AI capability. The rule of thumb was 10 GPUs per 1 data scientist. A rough number, but it was a starting point. Some vendors, especially the Beltway contractors, would brag about having one GPU, in the corner, next to their data scientist. It was a farce. Commanders will want to understand how the AI partner is thinking about compute power. Not all GPUs and CPUs are made equally, and each vendor may leverage these resources differently. They could very well be utilizing a public or private cloud that provides the compute infrastructure. Maybe they’re providing their own. Identifying these features will help commanders understand the dependencies inherent in the AI capability.
Talent and process matters here. AI expertise is being recruited with gusto and pay akin to NFL superstars. We’re not talking about a soldier doing some coding on the weekends or a self-certified “data scientist” piecing together a machine learning model. We’re talking about real AI talent and skill sets. Commanders need to understand who is developing the models, get an idea of their expertise and experience, and understand the development environments. Commanders will also need to understand the vendor’s experience working with the military and government. As you go down this path, you will soon discover that the most remarkable AI talent is likely not in government. They are probably hidden among your contractors, the private sector, and academic partners. If you find a crown jewel of talent, grab hold of them as tightly as possible and never let go. Your service provider will need to understand the peculiarities of working with the government and military. There’s unique domain experience within these channels that will need to be understood to develop AI applications that actually increase the lethality of the Joint Force. This is a service provider working as a partner, understanding the current limitations of military networks and developing bespoke tools based on your needs, not a fanciful idea of how AI is applied in the private sector.
Test and Evaluation (T&E).
Just as commanders test the accuracy of other weapons systems on the range and in exercises, they need to know how the AI will perform, at scale, in various situations. You’re looking for numbers on precision and recall, where they are now, and how they have evolved over time. Generally speaking, precision and recall are measures of relevance, validity, and sensitivity. The interpretation of precision and recall will change depending on the task and use case but understand that precision is akin to false positives and recall is akin to false negatives. Also, precision and recall numbers don’t tell the full story. Service providers can game the T&E process to show a more performant model optimized for a very specific use case, biome, region, or application. The numbers will look great until you try to use the AI for a slightly different situation, then it will fail. Commanders need to understand this in great detail and expect to receive updated T&E numbers for every new model that drops into your system. If your vendor or agency partner is telling you not to worry about it, then there’s likely no T&E being done, which is concerning. You should also look for quality control on the T&E numbers. You’ll probably find that there’s internal T&E conducted by the algorithm developer, then another set of T&E conducted by a government/military partner. Look at both numbers, compare them, ask questions.
Deployment on an AI-Enabled Platform.
In many ways, it’s much easier to develop robust AI capabilities in the commercial sector than in the government/military domains. This is because the commercial sector is not hampered by arcane programs of record. The bureaucratic program of record concept was built in the 1950s and 60s when we had to develop, procure, and maintain massive weapons systems such as aircraft carriers and stealth bombers. For some reason, the government is buying software and AI as if purchasing massive weapons systems. Commanders are burdened with one or several of these programs of record and forced to use them even though they break all the time, are never functional, and never operate as advertised. In light of these limitations, commanders need to understand how the AI weapon system integrates with these programs or how they’re approaching the deployment and visualization of models on an AI-enabled platform. Don’t think of this as merely a sexy user interface (UI); think of this as a full AI platform that can capture human-computer interaction and continuously feed new use cases for the algorithm. You’ll likely discover that legacy tools and all those programs of record can neither integrate with AI/ML, nor feed your AI pipeline. You’ll also find that the UI platform will handle certain parts of the user workflow while the AI algorithm handles other key pieces. The trick is to identify the seamless interaction of the two—how the UI & backend platform handle data for the AI algorithm, and vice versa—to create a powerful capability for users at all levels.
There are some ubiquitous concerns to keep in mind throughout the AI development process, but can be thought about separately from the core elements of AI.
Information Assurance and Authority to Operate.
Information Assurance (IA) and Authority to Operate (ATO) are often an afterthought. Your network specialists should understand the idiosyncrasies of network architecture and have a plan to deploy your AI platform. With that said, keep in mind that you’ll need an ATO to operate on each and every military network where you plan to deploy this AI capability. Oftentimes, vendors and agencies will rely on an ATO approved by one service or agency that is reciprocated on other networks. This will make or break your capability, so be aware of it.
AI isn’t cheap. If done right, it’s incredibly expensive. If properly employed, it’s worth every penny and will result in a competitive edge on the battlefield. If someone is promising to do it on the cheap, then they’re probably giving you drive-by AI. Another indicator of cost that can help you decipher the real versus fake solutions offered by vendors are cloud costs and burn rates per week and month. Ask your service provider to get an idea of what and how they are spending. Their choice to respond, or lack thereof, will provide an important indicator.
Development Security and Operations (DevSecOps) Pipeline.
For AI to work, you’re going to need a well-defined development platform and pipeline. Military and government networks are designed for scale and security, not for upgrading advanced software and algorithms multiple times a day. Your networks are designed to get a massive upgrade once or twice a year, do a laborious scan of risks, and then upgrade at scale. It’s a mess and doesn’t work with AI. You’ll need to ensure your vendor or service provider has a robust DevSecOps process that allows seamless updates of code, remotely, multiple times per day without getting shut down by some obscure office designed to “protect” networks. Over time DoD will redefine its conceptualization of risk and get to the point where it understands that it inherits more risk fighting off old code than assuming some risk by upgrading more often. We’re not there yet, but for the purposes of your AI capability, you’ll want to ensure that you’re not writing code off old information if newer, performant and peer-reviewed algorithms are available in the public domain.
This is your Commander’s AI Smartcard, a rubric to inform risk and decision-making when confronted by the next Eli Whitney of AI. Over three hundred years beyond Eli Whitney’s ruse, the landscape of warfare has drastically changed; from interchangeable musket parts, to interchangeable blocks of non-physical, digital code that can detect friend or foe. Yet, although the tools of the battlefield have changed, we are still human. We are always subject to having the wool pulled over our eyes. Assuming the AI capability meets this initial scrub, commanders will face the uphill challenge of data—untrapping, accessing, cleaning, structuring, and labeling. But we must confront risk before challenge, the risk of trapped domain knowledge in AI decision-making. Over time, commanders and senior battle staff will develop their own rubrics to better decipher the heart of their AI requirements. Too important to have outside of the commander’s domain, the AI Smartcard is meant to be a starting point for internal expertise around AI technology.
The views expressed are those of the authors and do not reflect the official position of Duke University, Department of the Army, or Department of Defense.