• PagerDuty
    /
  • Blog
    /
  • AIOps
    /
  • Build Operational Resilience with Generative AI and Automation

Blog

Build Operational Resilience with Generative AI and Automation

by Débora Cambé November 28, 2023 | 5 min read

For modern enterprises aiming to innovate faster, gain efficiency, and mitigate the risk of failure, operational resilience has become a key competitive differentiator. But growing complexity, noisy systems, and siloed infrastructure have created fragility in today’s IT operations, making the task of building resilient operations increasingly challenging. The burden is on the CIO to steer and scale modernization and digital transformation initiatives to deliver reliable customer experiences, faster. But how do you accomplish this with so many challenges and constraints?

Adopting AI and automation into the right operational processes is key to achieving this transformation. Customers have been looking to us to help them address urgent, critical work for years. That’s why we’ve been continuing to invest in AI and automation capabilities to make operations more efficient, scalable and resilient. 

This week, we are excited to announce enhancements to the PagerDuty Operations Cloud that will help teams transform operations and innovate faster. Let’s take a look at some of the latest advancements that help make this a reality. 

Remove guesswork and toil during an incident

We’re really excited to announce PagerDuty Advance, a set of capabilities that brings generative AI to the PagerDuty Operations Cloud. It acts as an expert partner to PagerDuty users, providing them with analysis for more efficient decision-making, authoring and executing automation, creating timely status updates, drafting incident postmortems, and more. PagerDuty Advance supports human-led efforts and saves companies time and money by eliminating repetitive and time-consuming operations tasks and reducing the effort and skill level needed to perform the remaining work. 

The most recent innovation from this exciting set of capabilities is AI Assistant, which was built to help teams manage incidents from event to resolution. A Slack-first generative AI chatbot provides responders with helpful insight at every step of the incident lifecycle from event to resolution. The feature anticipates common diagnostic questions and guides users through troubleshooting steps with actionable intelligence. By surfacing the answers to questions like “What happened,” “What changed,” and “What’s the customer impact?” responders are empowered to achieve faster and more efficient incident resolution. 

AI Assistant joins a growing list of genAI features powered by PagerDuty Advance that are available for early access today, including Automation Co-Author, AI-generated status updates, and AI-generated postmortems. 

Break through silos for better major incident management

Many existing AIOps solutions are too heavy on technology, requiring endless training and maintenance without the ability to scale past an initial set of applications, hampering time to value. The pace of modern operations demands a new approach that can extend the value of AIOps to all teams, applications, and services across the business— regardless of workflow or toolchain. That’s why PagerDuty designed a differentiated approach to AIOps that supports both central IT and distributed teams at the same time. Whether you’re driving a NOC modernization initiative, looking at implementing L0 automation, or aiming to improve major incident management, PagerDuty’s AIOps solution can deliver results in days, not months. From one-click ML-powered noise reduction to event-driven automation, PagerDuty provides both service-level and global (cross-service) capabilities so that any team can reduce incidents and remove toil from their workflows. That’s why PagerDuty has been named a Leader in the Forrester Wave for Process-Centric AIOps

We introduced Global Event Orchestration earlier this year so that a single SRE could control enrichment, route events, or trigger self-healing actions based on event conditions across any or all services within PagerDuty. Customers have been asking for this flexibility in our noise reduction feature set, so we’re happy to announce that Global Alert Grouping is now generally available. Organizations can now reduce noise across one or multiple services by using Content-based Alert Grouping with a flexible time window. With this new AIOps feature, teams can experience fewer incidents, improve MTTR by distilling the signal from the noise, and have a better understanding of the incident scope. Learn more here.

Global Alert Grouping configuration page

Shift towards a more proactive operations posture

During an incident, responders have to swivel between tools and look across services, rifling back in time through old incidents to understand what may have caused an issue and how to alter response accordingly. With no historical lens to examine this mountain of data, incidents can get dragged out. Organizations are also looking to automation to help reduce both the amount of data and incidents. But without key historical information that can be built into automation, the reduction is limited to what’s currently happening right now with no additional context. This increases MTTR and customer impact.

Event Orchestration Variables enable customers to build intelligent automation that helps inform other tools and processes for a faster, more targeted incident response that can be standardized across the organization for better cross-team collaboration. Users can construct precise, event-driven automation that is kicked off based on historical context. This capability helps teams learn from and prevent similar issues from repeating in the future, ultimately shifting towards a more proactive operations posture. The result is operational maturity at scale via expected, repeatable outcomes that scale across the technology ecosystem. This capability supercharges PagerDuty AIOps and acts as a connective tissue for all of the PagerDuty Operations Cloud, running Incident Workflows, populating Custom Fields, and engaging Automation Actions for smarter end-to-end event-driven automation. AIOps customers can sign up for Early Access.

Global Orchestration configuration screen showing the Orchestration Variables configuration sub-menu

Conclusion

In a world where applications and digital services are your gateway to revenue, your growth and reputation are only as good as your team’s ability to handle interruptions in your digital ecosystem. Rapid resilience–that ability to bounce back quickly from failure–has become an essential capability for modern enterprises. PagerDuty is an end-to-end platform that empowers teams to drive efficiency and lower the total cost of operations by leveraging AI and automation at scale. Try it free today.