Blog

PagerDuty Operations Cloud Fall Launch 2023

by Inga Weizman October 30, 2023 | 8 min read

Across the business landscape, 2023 has been called the “year of efficiency.” Organizations have had to deliver more growth and innovation, but with tighter budgets and headcount than in prior years. CIOs have needed to build strategies to mitigate the risk of operational failure and protect their brand’s customer experience. These forces have led many organizations to turn to AI and automation to increase productivity at scale, as the ability to work in modern and efficient ways has become a competitive advantage. 

To help our customers gain that edge, PagerDuty is launching new features across the PagerDuty Operations Cloud that streamline operational processes and critical, unplanned work—all with the help of AI and automation. These new features across AIOps, Incident Response and Process Automaton enable our customers to drive faster growth, compress costs, and build customer trust.

And in the spirit of efficiency, we’re enabling our customers to consolidate their tools and datasets as they continue their journey of digital transformation. When organizations run efficiently—whether it’s a tech stack, process flow or budget—they can protect their business’s bottom lines and empower their DevOps and SRE teams to focus on the most strategic, innovative and fulfilling work possible.

Let’s dive into these new announcements and how they work for our customers.

Supercharge your team productivity and innovation with AI

The time to invest in AIOps is now. In their AIOps: Growing Adoption and Best Practices report, IDC predicts that “by 2026, 90% of large enterprise CIOs will use AIOps solutions to drive automated remediation and workload placement decisions that include cost and performance metrics, improving resiliency, and agility.” By leveraging PagerDuty’s innovative AI and AIOps capabilities, teams can reduce overall interruptions by 87%, providing them with more focus and productivity in their day. This means that businesses can move faster, build more business-wide automation, and speed up learnings—all with existing or reduced headcount.

Noise from alerts can be overwhelmingly disruptive. Global Alert Grouping helps centralized IT teams–such as NOCs and SREs–further reduce the noise across services by grouping alerts based on custom rule sets. This helps teams better understand the incident scope, resulting in faster resolutions and less downtime, and creating more time for innovation. 

While many organizations know that automation is a must-do, not all know how to get started. AI-generated Runbooks (public beta) help quickly build more automation for streamlined operational processes. By using natural language processing (NLP), text-based prompts are transformed into automation scripts to help you get started and build more automation faster. This democratizes automation authoring across the organization for IT operations, SRE and platform engineering teams, and lends a hand to less experienced users.

But the improvements don’t stop after an incident is resolved, as the savviest teams are continuously learning and adjusting. AI-generated Incident Postmortems (EA) help save time and remove toil from postmortems. AI-generated summaries create a comprehensive report of what happened when, how it was resolved, and key actions for next time. This new capability automatically collects and collates full incident data so DevOps and SRE teams can focus on the learnings, instead of the time-consuming tasks of copying and pasting all the incident-related data from logs, Slack and tickets. 

During incidents, keeping stakeholders informed is critical, but also time and resource-consuming. Some organizations have multiple people focused solely on stakeholder updates during large-scale incidents. AI-generated Status Updates (EA) enable teams to generate status updates with just a few clicks, making it easier to keep internal stakeholders and executives in the loop. Leveraging generative AI for status updates means you can reduce the number of people dedicated to updates, reducing costs and freeing up time for your team. And both AI-generated Incident Postmortems and Status Updates help your DevOps and SRE teams to implement and consistently maintain best practices adoption across services.

Scale operational efficiency with event-driven automation to trigger intelligent remediation

CIOs are continuously seeking ways to optimize their operational processes in the name of efficiency. It’s no longer a question of when, but how to leverage automation across your entire organization. According to Gartner, “by 2027, 75% of enterprises will combine their siloed automation initiatives to improve overall value, which is a significant increase from fewer than 10% in 2022.” Organizations are widely adopting automation into their overall operations processes so they can move faster and have access to relevant, real-time actionable data for better decision-making. With the amount of data that organizations are handling, it’s no longer efficient or possible for humans to sift through manually—they need the power of automation to help them resolve issues faster with little to no human intervention. When humans are involved, they need to be empowered with relevant data that can help them resolve issues faster. 

Incident response needs to be highly coordinated, collaborative, and standardized. When managing a critical issue, there is no time to go on the hunt for people, historical data or information. Event Orchestration variables (EA) empowers SRE teams to build intelligent automation that helps inform other tools and processes for faster, more targeted incident response that can be standardized across the organization for better cross-team collaboration. 

Incidents are not always unique or uncommon; in fact, many are similar by nature and overlap in learnings and relevant institutional knowledge. PagerDuty Runbook Automation enables PagerDuty AIOps and human responders to trigger automation for diagnostics and remediation of well-understood incidents. Leveraging the synergies of the Runbook Automation Add-On and AIOps helps resolve incidents up to 95% faster by automating repetitive tasks, freeing up specialists’ time, and allowing SREs, platform engineers, and enterprise architects to focus on more complex incidents. Runbook Automation Add-On also supports automation use cases for DevOps, ITSM, self-service IT and event-driven automation. This helps customers reduce planned downtime by as much as 85% and support costs by as much as 55%.

Build resilience with a platform that fits with the way you work 

The uncomfortable truth is that “everything fails all the time.” Incidents will happen. Acknowledging that reality and preparing for it is how you can mitigate the risk and severity of the impact on your customers and your teams. Using online revenue data, Gremlin calculated that a single minute of downtime for a top e-commerce site can cost $200,000 or more of lost revenue. The buck does stop here, once you consider the negative impact on the brand, the total future cost is much much higher. Organizations that find ways to reduce the impact of downtime will ultimately gain a competitive advantage by continuing to maintain and improve the customer experience. With this latest set of enhancements and features, we’re helping our customers save costs and gain more value and better efficiency from their tools. With the added flexibility and customization, customers can better optimize operational processes according to their needs.

Today, organizations are seeing more data than ever, which can be challenging to manage due to its dispersed nature across various tools and systems. Extracting and consolidating this data can be a labor-intensive process. That’s why PagerDuty Analytics offers a convenient out-of-the-box Analytics Dashboard (EA) and scheduled Analytics Emails (Limited Customer Preview), giving visibility into how each metric is performing over time and best-in-class benchmarks, enabling better questioning and aiding in planning for major incidents. They deliver streamlined key performance metrics and data to the right stakeholders to help them continuously improve operational efficiency across their teams. Customers engaging with PagerDuty analytics saw their mean time to acknowledge (MTTA) improve by 28%, as well as more equitable distribution of work and consistent response hours, equating to saving 100 hours of work time per year, per team.

Teams across DevOps, central IT or SRE need more flexibility to customize their tools—while also ensuring they are using best practices whenever possible—as they continue to refine their operational processes. Incident Workflow Enhancements enable organizations to customize their workflows while providing templates built on industry best practices for getting started quickly. By reducing manual steps through automated triggering of diagnostic and remediation processes using Runbook Automation, Incident Workflows help expedite the resolution and ease the workload for incident responders

Global, distributed teams need to collaborate efficiently during incidents. With Slack/Chat as a Contact Method (EA), you can quickly mobilize a response team without context switching or relying on SMS, which can be slow, unreliable and costly. This new feature translates into cost savings for global teams that can utilize WiFi instead of relying on cell coverage. It also saves time and allows teams to collaborate and communicate the way they want, with better precision.

We have also expanded our partnership with Google Cloud, and are a key integration partner in Google Cloud Personalized Service Health Integration. It sends proactive, customized and detailed alerts about Google service disruptions to get ahead of customer-impacting issues. The PagerDuty and Google Cloud partnership offers a vital platform for efficient cloud operations, aiding customers in responding to disruptions and ensuring smooth digital experiences.

Get Started With These New Features

The PagerDuty Operations Cloud is helping our customers redefine critical operations work with powerful AI and automation, so they can save costs, innovate faster, build stronger resilience, and scale their workforce. This latest PagerDuty platform release provides greater flexibility to work how customers want to deliver on the promise of fewer incidents and improved mean time to resolution (MTTR). 

Learn more about all these new exciting features, and sign up for early access to our GenAI capabilities.