Skip to main

Emerging Voices Series: Rethinking Artificial Intelligence at the Strategic Frontier

[May 2026]

An Interaction-Centered Approach to Trustworthy Decision-Making in Defense and National Security

Artificial intelligence (AI) as a strategic frontier is often framed in technological terms, defined by algorithmic capability, scale, and speed. Yet, the strategic impact of AI is not primarily in its artificialness. It is in how it is transforming what we consider to be intelligent at both the individual and organizational levels.

The prominence of AI in how we think about intelligence today traces back to the World War I and II-era concepts of autonomous (or “unmanned”) systems, which enable people to work faster and more precisely. Norbert Wiener coined the term “cybernetics” in the 1940s, referring to the interactive loops between biological and artificial systems that make this joint prowess possible. In the 1950s, Paul Fitts enumerated which kinds of physical and cognitive tasks “men are better at, [and] machines are better at”—a list still often invoked in the design of automation systems using the shorthand “MABA-MABA”. As early as 1960, however, J.C.R. Licklider speculated that advances in AI would result in “man-machine symbiosis”, by which “men and computers working together in intimate association […] should be intellectually the most creative and exciting in the history of mankind”.

In other words, the idea that people and machines can collectively achieve levels of intelligence beyond their individual potentials is not new. What has changed is the nature of the systems involved. “AI” is increasingly associated with algorithmic models whose inner workings are not readily interpretable by the people expected to use them. As AI systems become more embedded in decision processes, end users—such as drone operators or law enforcement personnel—exercise less direct control over how decisions are generated, even as they remain responsible for their outcomes. This creates what can be described as a “responsibility gap”, though framing it this way risks obscuring the underlying issue. The problem is not simply that AI systems are opaque, but that they are treated as if they are external to human decision-making when they are, in fact, inextricable from it.

Unsurprisingly, there is growing concern that these limitations will result in systems where people are completely removed from the decision-making loop, in favor of fully autonomous systems. Yet, legal and ethical constraints—particularly in safety-critical domains such as national security and warfare—preclude the full automation of consequential decisions. The challenge, therefore, is not preventing the elimination of human involvement. Rather, what does it mean for people and AI to decide together—and how can we ensure that this happens in a trustworthy way?

From Trust to Trustworthiness in AI-Enabled Systems

In the field of human factors and ergonomics, designing for trustworthiness traditionally implies a dual focus. The first is the creation of automated systems that are very reliable at a defined set of tasks. The second focus is training human users to rely on automation appropriately by understanding how it functions, and when or how it might fail. Thus, for a long time, answering the question, “How appropriately does the user trust the automation?” was a sufficient proxy for assessing how trustworthy the human-and-automation unit is at making decisions.

Today, enabling users to appropriately trust in AI remains an important and rapidly evolving area of interest across disciplines. For example, large-scale research efforts such as DARPA’s Explainable AI (XAI) program seek to develop algorithmic interventions that can equip users with heuristic information about AI decision logic. Human factors engineering and psychology researchers, on the other hand, continue to produce a plethora of metrics and methods for understanding people’s trust in AI. As a result, prominent AI design guidelines emphasize the importance of increasing user trust through transparency (e.g., Google’s People + AI guidebook). But amid all of this, there is substantial evidence that transparent AI design can increase users’ trust without actually improving decision quality. In national security and defense applications where vulnerabilities are often exploited through highly precise attacks, unwarranted trust can be as dangerous as distrust. Systems that are trusted too readily may be scrutinized less, making it more likely that errors go undetected until it is too late. Historical incidents such as the 1984 USS Vincennes targeting of Iran Air Flight 655 and the 2006 Patriot missile fratricide highlight the catastrophic risks of overtrusting systems because of how they are designed.

It is critical to address the gap between the tools and metrics available to AI system designers and the specific trustworthiness demands of national security contexts. The dual foci of designing for appropriate reliance are no longer enough in an era where rapid, data-intensive decisions are the norm. AI is so central to this new norm that “trustworthiness” is now better understood through the tightly interconnected succession of human and AI inputs that precede decisions. In other words, trustworthiness is no longer a byproduct of individual users’ attitudes about machine reliability. Instead, it is a property of the collective as a whole—a reason why the term “human-AI teaming” has become a core component of national security research. Therefore, if we want trustworthy AI-enabled systems, we must design and evaluate them with fundamental building block of their outcomes: the interaction.

From Human- and AI-Centered to Interaction-Centered

Focusing on interaction as the fundamental unit of design and evaluation requires a shift in how we approach both the development and assessment of AI-enabled systems. Rather than treating human-centered and AI-centered design as separate paradigms, an interaction-centered approach explicitly connects the two with an understanding of how design decisions are made in practice. This can be understood through interaction-centered design and evaluation as a continuous research and development loop.

Interaction-Centered Design

Interaction-centered design means going beyond the question of what information AI systems should provide to users, or what underlying AI architectures will constitute the system. Many current approaches to trustworthy AI, such as explainable AI, operate under the assumption that providing users with insight into AI reasoning will lead to better decisions down the line. However, design decisions must account for how the human-and-AI interactive decision-making takes place within operational contexts. This requires two interaction-centered shifts in how design is approached. First, the human–AI interaction itself must be treated as a primary design outcome, rather than merely the mechanism through which decisions happen. Second, the design process must be interactive as well, so that design ideas are soundly based on how decisions unfold in practice, rather than assuming that design intent will translate cleanly into use.

Consider, for instance, how AI transparency is often implemented through features such as visualizations, numerical confidence scores, or natural language explanations. Decades of research show that people interpret probabilistic estimates and explanations of automation logic beyond the uncertainty information itself, but also in light of their own task expertise. Yet, these design choices are typically made by developers whose mental models of AI processes differ substantially from those of end users.

Let me provide an example. What does it mean to tell a trained person that AI is “90% confident” in its output? To an airport security officer assessing whether a bag is truly free of threats, this might be too low; the cost of a missed detection is catastrophic. Thus, despite the high number, they might flag an item for manual search. In contrast, a triage clinician might see the same estimate as implausibly high for a diagnostic recommendation, where uncertainty is expected and overconfidence can be dangerous. In both use cases, these interpretations could also change depending on how many more decisions the user has to make. The issue is not simply that users may misunderstand a numerical output, but that its meaning is inseparable from the context in which it appears. An AI design feature can improve decision-making in one setting while degrading it in another. Few fielded systems are openly based on existing design guidelines for adapting information presented to users based on their history of human–AI interactions. Nonetheless, there are now policy frameworks for mitigating the risks of adaptive AI design.

National security and defense research can draw from participatory design traditions in other safety-critical domains, where trustworthiness is evaluated not only in terms of technical performance but in terms of how people and technologies function together under real-world constraints. For example, studies of “smart” hospital infusion pumps show that safety features are best designed with end users to prevent mismatches between device design and existing clinical workflows. Successful participatory design efforts in healthcare settings focus less on improving device accuracy and more on how clinicians’ interactions with these systems integrate seamlessly with their existing routines. This highlights a crucial principle: if the goal is to enable trustworthy decisions with AI, system designers and stakeholders must actively collaborate to ensure that final designs do not impair frontline workers or dissuade them from using the AI systems in the long run. The successful translation of participatory design approaches for AI-enabled defense systems is a keystone to the strategic design of AI applications for national security decisions.

Interaction-Centered Evaluation

If interaction is the fundamental unit of design, it must also be the fundamental unit of evaluation. At present, however, AI systems are primarily evaluated at the level of the model using benchmark metrics that compare performance across standardized tasks. These benchmarks—while essential for tracking algorithmic progress—offer limited insight into how advances in foundational models would result in how the human-and-AI unit might behave in real-world decision-making environments. In particular, they rarely capture how model errors may compound with human cognitive biases, and how these combined effects can impact decisions over time.

Many advances in foundational models are motivated by technical objectives such as improving accuracy or computational efficiency. While these improvements may yield incremental gains in classical human–AI interaction measures—such as increasing reliability of AI recommendations—their impact on actual decision quality in applied settings is often indirect and poorly understood. This gap is especially consequential in high-stakes domains: systems that perform well on benchmarks may still contribute to poor decision-making when deployed under conditions of uncertainty, time pressure, and incomplete information.

For example, in August 2021, a U.S. drone strike in Kabul—intended to neutralize an imminent ISIS threat—killed ten civilians, including seven children, after a vehicle was misidentified within a fast-moving intelligence and surveillance process. This tragedy highlights how failures can emerge not only from incorrect data, but from how information is interpreted and acted upon under time pressure. As AI-assisted targeting becomes more prevalent, there is a growing need for evaluation metrics that capture how such systems shape what operators see, prioritize, and act upon.

At the same time, the infrastructure required to evaluate AI systems in realistic human-in-the-loop settings is costly and difficult to scale. Controlled human subjects experiments provide valuable insights into how people interact with AI, but they are resource-intensive and often lag far behind the pace of algorithmic innovation. Increasing the fidelity of these evaluations, e.g., through more realistic simulations, larger participant pools, or longitudinal studies, further compounds these costs. As a result, evaluation efforts are frequently decoupled from the development of foundational models, limiting their ability to inform design decisions in a timely manner.

Advancing an interaction-centered evaluation approach would address these challenges by focusing explicitly on the joint human-and-AI cognition process. First, it would spur the development of evaluation methods that supplement purely algorithmic benchmarks with measures grounded in observed interaction data. This includes metrics that capture how users and AI counterparts communicate and interpret each other’s insights, how quickly and effectively they detect and intervene with potential reasoning errors, and how decision qualities evolve over successive human–AI interactions. Recent efforts in national security research point toward scalable approaches in this direction. For example, DARPA’s Artificial Social Intelligence for Successful Teams (ASIST) and Exploratory Models of Human–AI Teams (EMHAT) programs explored simulation-based environments for coordination and social reasoning in various human–AI teaming contexts. These programs provide a model for leveraging advances in AI technologies, such as large language models, to test their downstream effects on actual human–AI collaborative work.

Advancing interaction-centered evaluation would also provide a level of coordination between AI researchers and system evaluators. Given the cost and complexity of high-fidelity human-in-the-loop evaluation, it is neither feasible nor necessary to evaluate every incremental improvement in model performance. However, there is a need to identify thresholds at which advances in foundational models are likely to meaningfully affect actual human–AI interactive decision-making—whether in controlled laboratory studies or in more naturalistic applied environments. The U.S. Air Force’s Decision Advantage Sprint for Human-Machine Teaming (DASH) events provides a model for how interaction-centered evaluations can be structured to bring commercial AI developers and applied researchers together and guide the development of tomorrow’s warfighting technologies. Establishing these evaluative thresholds would enable more strategic allocation of resources, ensuring that empirical studies are conducted where they are most likely to yield operationally relevant advances—both in foundational AI research and in fielding existing applications.

As a strategic frontier for national security and defense, AI is fundamentally about enhancing what it means to be intelligent in today’s horizon of risks and conflict. This boils down to what AI must ultimately contribute: decisions that are more trustworthy not only because they are effective and efficient, but also because they ethically support both national security personnel and the people they serve to defend. Today’s AI systems have brought us to the precipice of Licklider’s era of “man-machine symbiosis.” The interactions that make this symbiotic relationship possible must therefore become a focus of innovation, rather than remain an implicit byproduct of it. Thankfully, decades of research and development have already yielded a strong foundation to build on in effecting this change. We must continue bridging algorithmic innovation and real-world impact toward a future where human–AI decisions build, rather than risk, global security.

This essay is from the Security & Defence PLuS Emerging Voices Series, which highlights the next generation of scholars and practitioners shaping thinking on strategy, security, and defence. The series brings together perspectives from PhD candidates and early career researchers, grounded in the complex geopolitical realities of the 21st century.

The collection explores a “Latticework of Resilience” that connects often-overlooked sectors, such as subnational diplomacy and critical infrastructure inherent in agriculture security, to the core of national security. Taken together, the essays emphasise the importance of adaptive, multidisciplinary approaches to building resilience in an increasingly complex global environment.

The first event in the Emerging Voices series was held at Arizona State University on 2 March 2026, with events to follow at King’s College London and UNSW. Watch the event recording: The Emerging Voices Series: Strategy, Security & Defence at Arizona State University.


About the Author

Myke C. Cohen

Myke C. Cohen is an Associate Scientist at Aptima, Inc., and a PhD student in Human Systems Engineering at Arizona State University. His research focuses on the design and evaluation of human–AI systems in safety-critical contexts, including transportation security, defense, and intelligence. Cohen is a recipient of the Human Factors and Ergonomics Society’s Student Member with Honors Award, and was named an Ira A. Fulton Schools of Engineering Dean’s Fellow and the inaugural CHART Scholar at Arizona State University. Prior to his doctoral studies, he served as an Instructor of Industrial Engineering at the University of the Philippines Diliman, where he earned his BS in Industrial Engineering.

share

Security & Defence PLuS Research and Articles

Emerging Voices Series: For a Capability-Based Perspective on Economic Security

By

[May 2026] There is an emerging direction on the horizon for economic security policies, and Phoenix is on the front...

Emerging Voices Series: Subnational Diplomacy in a Fragmenting World

By

[May 2026] Global governance faces a paradox. Tricky challenges that urgently require coordinated international responses – pandemics, climate change, rapid...

The Strategic Imperative of Critical Minerals in AUKUS

By and

[July 2025] This article, "The Strategic Imperative of Critical Minerals in AUKUS" by Ansel Bayly & Dr Sarah Tzinieris, was...