7 min read
July 9, 2024
Accelerating Issue Resolution and Product Innovation Through AIOps
As companies accelerate digital transformation, they create complexity that often results in more frequent service disruptions. Incidents impacting customers are up 43% over the past year, costing close to an astonishing $800,000 per incident. Beyond this significant financial hit, outages can negatively impact customer experience, damage brand perception, slow innovation, and violate regulatory compliance.
Nearly half of CIOs consider artificial intelligence (AI) and machine learning (ML) among their organizations’ top strategic priorities for 2024.1 Adopting AIOps solutions, which integrate these technologies, can enhance operational resilience and efficiency. By leveraging ML and analytics, AIOps streamlines IT operations through issue detection, actionable insights, and automated remediation. This approach leads to increased employee productivity, faster incident resolution, and fewer overall operational disruptions.
Increase innovation velocity through scaled service ownership
Many organizations have embraced service ownership as a DevOps best practice where teams take responsibility for supporting the software they deliver throughout its lifecycle. However, as the pace of product delivery accelerates, organizations face challenges in scaling operations without increasing headcount. AIOps offers a potential solution to this dilemma, enabling teams to enhance operational capacity without hiring additional resources.
AIOps offers capabilities like noise reduction and event-driven automation that filter out non-critical issues and route critical alerts to the right teams without manual intervention. As a result, DevOps teams can focus on the highest-value tasks, and can work in a flow state with minimal disruption to business-critical work.
IAG Loyalty, which manages the loyalty programs for IAG’s airlines—British Airways, Iberia, and Aer Lingus—and over 125 global brand partners, prioritizes sustainable growth as it expands its partnerships and product offerings. Colin Lewis, Head of Core Engineering at IAG Loyalty, says:
“We don’t want the growth of our team and complexity of our operations to grow linearly with the growth of our business. We need to do more without scaling our operating costs.”
Following a DevSecOps approach, IAG Loyalty turned to AIOps to drive faster resolution and lower operating costs. With AIOps as the first line of defense, they reduced noise on certain services by as much as 70%. They also use AIOps to surface actionable insights, including context from past incidents, to determine probable root cause and expedite resolution. “This eliminates manual effort and gives our team more time for revenue-generating work,” says Colin Lewis.
Deliver seamless customer experiences through incident management transformation
AIOps enhances an organization’s ability to quickly detect major incidents. In critical situations that require human intervention, ML can rapidly surface key information, facilitating fast, fact-based decision making, and helping drive faster resolutions. By analyzing historical incident data, AIOps also provides valuable context for accurate triage, such as whether an incident has occurred before or the probable origin of the incident. These kinds of insights separate noise from signal quickly and effectively, and give teams a leg up in the moments that matter most.
TUI, the world’s largest tourism organization, has transformed incident management to win customer trust. In the travel industry, it’s critical to ensure reliable services like booking systems, payment processing, and customer service requests. The team at TUI embraced AIOps to accelerate triage and minimize the customer impact when things go wrong. “If there is an issue, AIOps tells us exactly where the problem is. On average, the time to recover from an incident is at least 30% quicker. With automated recovery it can be 90% quicker. If we already know a scenario, AIOps learns and responds by executing automated scripts to recover from service disruptions. Customers don’t even notice that we’ve had an issue,” says Yasin Quareshy, Head of Technology at TUI.
Operationally mature organizations embed AI across their end-to-end incident lifecycle to get ahead of issues before they start. Vodafone, a leading telecommunications company, creates customer value by focusing on service availability and maximizing the productivity of its engineering teams. When it comes to incident management, Vodafone has embraced a culture of continuous learning, improvement, and prevention. AIOps plays a role in resolving issues before they impact customers. Ahmed Elsayed, UK CIO & Digital Engineering Director at Vodafone says, “With incident management, people usually focus on how to recover. But what’s also interesting is after an incident—using AI technologies—can I correlate previous incidents and develop auto recovery for future incidents? Can I reach a state where I proactively detect a problem before it impacts our customers?”
The Vodafone team also sees a future where incidents are completely avoided with the help of AIOps. “With AIOps, I imagine that we reach a state where our ops teams are actually automation engineers. Rather than managing the fire, they would build the right automation so it never happens at all. Maybe a few quarters from now, we won’t be discussing how many incidents we had, or the mean time to recover, but how many incidents were avoided,” says Ahmed Elsayed.
Reduce costs through operations center modernization
The complexity of modern digital infrastructure poses significant challenges for operations centers. The traditional “eyes-on-glass” approach, coupled with the sheer quantity of data coming from enterprise environments, disparate tools, and manual processes, often results in incidents going undetected. Even when issues are identified, finding the right subject matter expert can be challenging, leading to unnecessary escalations and the accidental routing of incidents to the wrong teams. Even with well-staffed operations centers, teams might not detect incidents before they escalate into major business risks, resulting in unhappy customers and significant revenue loss.
AIOps capabilities can help modernize operations centers by providing a unified view of IT health. Noise reduction capabilities automatically correlate many different signals to prevent a “wall of red.” Further, leveraging event-driven automation allows smaller teams to do more with less despite growing incident volumes. Together, these capabilities help resolve incidents faster with fewer resources and less cost to the business. According to McKinsey, incident triage used to take hours and often involved having hundreds of people on call; now, with event correlation, companies can reduce the mean time to identify incidents by 50-75 percent.2
TD Bank is a top 10 North American financial institution serving 27.5 million customers worldwide. The company uses AIOps to drive noise reduction and help its operations center focus on what’s most important. Chris Conklin, Technology Executive – Enterprise AIOps at TD Bank says, “The way we’re tackling this is by reducing the middleman. When something is identified, we can quickly route an incident to the subject matter expert or responsible team, while keeping the operations center in the loop. By introducing automation, the operations team can help drive closure by collecting more information or executing actions.” Through these actions, TD Bank has seen a 25% improvement in proactively identifying and responding to customer-impacting events.
The power of AIOps
As organizations continue to navigate the complexities of digital transformation, the need for operational resilience is paramount. The significant rise in service disruptions and their associated costs underscores the business significance. AIOps offers a powerful means to automate and improve IT operations with fewer resources.
Real-world examples from industry leaders demonstrate how AIOps can deliver considerable business value through increased innovation, improved customer experiences, and reduced costs. The use cases in this article showcase how AIOps effectively supports critical business initiatives like scaling service ownership, transforming incident management, and modernizing operations centers.
As digital environments become increasingly complex, AIOps stands out as a crucial tool for enhancing operational efficiency and resilience. Embracing these technologies positions organizations to thrive in an ever-evolving digital landscape.
1 https://1624046.fs1.hubspotusercontent-na1.net/hubfs/1624046/R-ES_State%20of%20the%20CIO_2024%20(1).pdf
2 https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/tech-forward/it-resilience-for-the-digital-age
About the Author
Lisa is a Senior Customer Marketing Manager at PagerDuty.