Blog

Being Operationally Mature Can Save You Millions

by Jeffrey Hausman October 8, 2024 | 4 min read

On July 19th, a widespread technical failure crippled operations across industries, resulting in lost revenue, wasted operating costs, and damaged customer trust. For businesses that had built trust by providing reliable and resilient services, this had both an immediate and a lasting impact.

We estimate that the July 19th outage (‘outage’) cost our customers billions of dollars, with hourly downtime costs in the millions for some companies.1 Unfortunately, the consequences of the outage were not always measured in mere hours, as the residual impact reverberated for many days after the main event.

The Impact was Not Equal Across Companies 

Organizations that were more operationally mature recovered quicker and experienced 60% less business impact than their peers.2 PagerDuty data found that operationally mature customers both responded more quickly and efficiently to the Outage, with mean time to acknowledge (MTTA) up to 30% faster, and they proactively remediated residual issues before they arose. By leaning more heavily into the PagerDuty Operations Cloud—with noise reduction to focus the team’s efforts and automation to streamline and orchestrate the overall response strategy—our data revealed that teams were able to get to resolution more than 60% faster than their peers, empowering them to quickly return to their normal course of work. This translates to millions in potential savings from just one event, as well as establishing a reputation of resilience and reliability in the eyes of their customers.

As companies move forward from this experience, it is critical to evaluate if they’re prepared for the next event. In an interconnected world that relies on technical infrastructure that is both aging and becoming increasingly complex with advanced technologies like generative AI, it is not a matter of ‘if’ but ‘when’ the next outage happens.

Preparedness is an investment that will not materialize on its own during a crisis like the Outage. Instead, it requires companies to prioritize ongoing investment in operational maturity–including their operational platform, processes, and people on the front line.

Operational Platform
During an outage, companies need a platform they can trust and rely on to be up and running when the rest of the world is not operational. The PagerDuty Operations Cloud is that best-in-class platform with unrivaled reliability.

On July 19th, the PagerDuty Operations Cloud demonstrated resilience, with our data showing that despite an exponential increase in transactional volume over the norm (Incident Workflows up 1,400%), our platform performed well within its service level agreements. This allowed PagerDuty to play a crucial role in helping customers identify and resolve time-critical problems to get back online as quickly as possible and minimize the financial and reputational impact on their business.

People and Process
With the PagerDuty Operational Maturity Model
, customers can easily assess their current level of maturity as well as view top recommendations for improvement driven by peer-based benchmarking.3 The Operational Maturity Model contains key categories such as people management, noise reduction, and automation to help customers understand how prepared their teams are to manage incidents and time-critical work efficiently. This makes operational maturity at scale across tens or hundreds of teams a seamless part of everyday operations so organizations are always on their front foot.

Outside of the response effort, a critical part of the Operational Maturity Model is continuous learning. More mature organizations do not leave experiences like the outage–or even incidents with far less news coverage–in the rearview mirror without review and analysis. They leverage analytical insights and post-incident audits to identify areas of strength and resilience in their operations, as well as opportunities to mature. The PagerDuty Operations Cloud includes these analytics and post-incident review capabilities that can help companies improve their operational maturity and set themselves apart from their peers (you can learn more here).

Get started with the PagerDuty Operations Cloud today, or learn more about how you can increase your organization’s maturity with our Operational Maturity Model.

 

1 Numbers calculated based on PagerDuty customers who had greater than 100% increase in high urgency incidents, the relative magnitude of the increase, the time that these customers remained in their elevated incident response state, and their total annual revenue and operating expenses.

2 These numbers were calculated by comparing customers who met or exceeded their 180-day average mean time to resolution during the Outage with all other customers.

3 Recommendations based on companies in similar industries and of similar size.