- 1 .
- 2“The Perils of Highly Interconnected Systems,” MIT Technology Review, 2012, .
- 3.
- 4“Dealing with Information Overload: A Comprehensive Review,” National Library of Medicine, 2023, .
When it comes to the intricate web of interconnected systems that hold our lives in a delicate balance, a single misstep can trigger a cascade of catastrophic events. In July 2024, the cybersecurity company CrowdStrike pushed a faulty update to a monitoring product, and this routine error resulted in massive and costly disruptions across the globe. Businesses faced staggering revenue losses, and state governments experienced significant process disruptions.1
The CrowdStrike outage highlighted the fragility of today’s IT systems. This incident is far from an isolated case. Research has consistently shown that unplanned system outages can have significant impacts on critical infrastructure sectors. The interconnectedness of these systems means that a failure in one area can affect multiple critical services simultaneously. As early as 2012, MIT Technology Review noted that the most highly interconnected systems can give rise to catastrophic domino effects.2
The proliferation of AI agents capable of performing tasks with minimal human oversight has the potential to fundamentally change how IT systems are managed and lessen the significant cognitive burden IT professionals carry. The introduction of agents also comes with a set of complex questions. Which tasks within IT environments should be automated? How do enterprises ensure human IT professionals can focus on the critical tasks they alone can handle? What role should humans play in overseeing the work of agents?
In this article, we use the Endsley Situational Awareness Model—a three-level theoretical model of situational awareness—to deconstruct situational awareness and decision making in IT environments, provide prescriptive guidance on how to apply artificial intelligence (AI) for IT Operations (AIOps) to address many of the challenges human IT operators face, and identify necessary cognitive supports for times when humans are required to coordinate with and oversee AI agents.
According to the Endsley model, the first level of situational awareness entails perceiving the status, attributes, and dynamics of relevant elements in a given environment. Processing and recalling vast amounts of information can overtax human memory and attention and lead to burnout and critical errors. This cognitive overload can manifest in two ways: focusing on irrelevant data points (errors of commission) or overlooking crucial information (errors of omission). Both scenarios compromise perception and degrade the ability to make informed decisions, which is why it comes as no surprise that this is the stage where most errors occur—70 to 80 percent of situational awareness mistakes happen during perception.3
The impact of information overload on perception is not just theoretical. Research shows that information overload is associated with serious performance losses, especially in connection with disruptions and interruptions.4 In IT operations, where quick responses to system alerts and anomalies are crucial, such performance losses can have far-reaching consequences, such as:
AIOps addresses the perception-related challenges inherent in these tasks by filtering vast amounts of data to highlight only the most relevant information. Presenting human operators with actionable insights will result in reduced cognitive load. AIOps can go one step further by identifying subtle patterns or anomalies that might escape human attention, thereby enhancing perception and reducing the risk that an important signal will be overlooked. In addition, AIOps can provide or enhance several critical tasks:
The second level of Endsley addresses the importance of understanding the significance of the perceived elements in relation to operational goals. This is where context becomes crucial, and patterns start to emerge.
AIOps supports comprehension by correlating data from multiple sources to provide a holistic view of the IT environment, offering context-aware insights that explain the potential impact of observed phenomena and reducing the time required to diagnose issues by automatically identifying root causes. With improved comprehension, operators at all experience levels can make sense of the environment and make more informed decisions about how to respond to various situations:
The highest level of situational awareness involves projecting future actions and states of elements in the environment. In IT operations, this translates to predicting potential issues before they occur and understanding the likely outcomes of different actions.
AIOps enhances projection capabilities by using machine learning models to forecast system behavior and potential failures, simulating the impact of proposed changes before implementation, and providing decision support by evaluating multiple courses of action. By improving projection capabilities, AIOps empowers operators to take proactive measures, preventing issues before they impact critical services:
While decision authority within IT environments currently resides with human operators, this paradigm may not remain feasible as the size and complexity of our IT landscape continues to grow. To keep pace with this accelerating growth, augmenting the work of human operators with autonomous AI agents will become unavoidable. Agent-based architectures—designed to handle non-deterministic scenarios through integrating specialized AI agents capable of perceiving their environment, making decisions, and taking actions to achieve specific goals—are poised to become more prevalent across industry.
The Endsley Situational Awareness model provides insights on how best to enable agent autonomy. As organizations begin to appreciate the immense capabilities of these AI systems, the role of humans in the decision-making process will undergo a significant transformation.
This transition isn’t just a matter of technological capability but also of necessity. According to the Oracle study, 70% of business leaders would trust a robot more than a human to make financial decisions. This startling statistic shows the growing recognition that AI systems may be better equipped to handle certain complex decision-making tasks, particularly in data-rich environments like IT operations.
Initially, humans will remain in the loop, actively overseeing and guiding the actions of AI agents to ensure accuracy and alignment with predefined goals and ethical standards. AI agents will become responsible for perceiving the environment, analyzing the data to determine its significance, and providing possible projections about what may happen next. But humans will retain the decision authority. Teams of agents working to create recommendations for human consideration is a crucial first step in building trust and confidence in the AI’s capabilities. It also allows human operators to intervene when necessary.
As the AI agents work together and demonstrate increasing reliability and effectiveness, humans will transition to on-the-loop roles. In this capacity, they will provide oversight and intervention only when required. The same teams of agents will still create recommendations structured by perception, comprehension, and projection; however, another agent will oversee these insights and make decisions about which course of action to take. Maintaining the same human-centric situational awareness structure will be critical for building trust in the AI agent’s decision making because it allows humans to better understand the “thinking” of the AI agent.
Trust is not a binary state but a continuum that develops over time through consistent, reliable performance. Research by Zhang, Liao, & Bellamy (2020) on has shown that providing explanations for AI recommendations significantly improves accuracy and trust calibration. By further structuring the decision-making thought process in human terms, the development of trust can accelerate, and human oversight becomes natural rather than investigative.
As AIOps systems evolve, the relationship between human operators and AI will transform, with humans finding themselves out-of-the-loop and entrusting AI agents with full autonomy for many decision-making processes. This evolutionary process will require a new framework for gradually delegating decisions to the AI systems and placing human oversight at the macro level. This new framework enabling agentic autonomy should be comprehensive and adaptable, and it must consider several factors:
While operating an IT environment will still require thousands of decisions each day, during this final stage, AI agents will make most of those decisions independently on our behalf. The transition to this level of autonomy will be gradual and carefully monitored, ensuring that the AI systems are fully capable of handling the complexities and nuances of their designated tasks. Agent-based architectures represent a paradigm shift in the way organizations approach decision making and operations. The transition from human-in-the-loop to human-on-the-loop, and eventually to human-out-of-the-loop, will require careful planning, continuous evaluation, and a commitment to ethical AI practices. As trust in AI systems grows, so will the number of autonomous AI agents we rely on each day.
As the complexity and scale of IT operations grow, so does the risk of inaction. AIOps offers the vital support needed to transform overwhelmed operations into efficient, proactive systems capable of withstanding even the most daunting challenges. Every moment of delay increases vulnerability to costly failures, missed opportunities, and disrupted human lives.
By leveraging AIOps, organizations can significantly reduce downtime, optimize resource allocation, and enhance overall system performance. These benefits directly translate to improved mission outcomes, whether in government agencies, healthcare systems, or critical infrastructure.
Rather than replace human insight, AIOps empowers it. AIOps creates a strong partnership between human expertise and AI, allowing teams to make faster, more informed decisions while reducing the cognitive strain of managing countless data points. By using the Endsley model to deconstruct the human decision-making process, organizations have a blueprint for meshing together AI with human insights to form this human-AI partnership. The future of IT operations lies in evolving this partnership, with AI agents handling a number of routine tasks and decisions and allowing operators to focus on critical mission objectives.
Enhancing human decision making and learning to rely on AIOps solutions and agents are necessities. For us to meet the growing challenges of our day and ensure the systems that power our nation remain resilient, we must learn to rely on AIOps solutions and agents. Building trust will take time and the careful, gradual relinquishing of control. By understanding ourselves and our own cognition, we begin the journey of shaping this future partnership with AI and take the first step to overcoming the challenges of IT operations today.
is a research developement leader in ĢƵ Allen's Chief Technology Office. He specializes in intelligent automation, distributed systems, cloud architecture, and DevSecOps methodologies.
provides expertise in psychology, health, and cognitive performance to the ĢƵ Allen Human Performance team.
is a generative AI strategist and product management leader with ĢƵ Allen's Chief Technology Office.
is a chief engineer on ĢƵ Allen's Civil Tech team, with expertise in enhancing large-scale software delivery through reuse, improving developer experience, and integrating AI throughout the software development lifecycle.
ĢƵ Allen's annual publication dissecting issues at the center of mission and innovation.
Want more insights from Velocity? Sign up to receive more stories about emerging technologies and the impacts they’re making on missions of national importance.