4 New Ways to Improve Incident Management with Event Orchestration
In an era where efficiency and smart technology integration are key, 71% of technical leaders report their companies are expanding their investments in artificial intelligence (AI) and machine learning (ML) this year. With the sheer volume of data coming into the enterprise and the need for timely response, monitoring every incoming alert around the clock is impractical, and human vigilance alone is too imprecise. Instead, leveraging data-driven predictions based on how your system has historically operated can provide a more effective approach to managing and responding to incidents. This is where PagerDuty’s Event Orchestration comes into play.
Event Orchestration helps organizations do this by creating end-to-end event-driven automation. This capability enhances how organizations detect incidents, correlate root causes faster, and scale operational maturity across technical teams so they can work more cohesively and effectively.
With Event Orchestration variables, teams can build intelligent automation that seamlessly integrates with other tools and processes, enabling a more targeted incident response that can be standardized across the organization. This new Event Orchestration capability helps you learn from past incidents and prevent their reoccurrence. The result is a more proactive posture to operations, with scalable and repeatable outcomes that benefit the entire technology ecosystem.
Let’s cover four ways you can use this new capability today.
1. Major incident management automation
Most organizations handle major incidents differently than they would lower priority incidents. A major incident might require different escalation paths, workflows, and internal processes. As such, automation during major incidents is often more bespoke.
With Event Orchestration variables, teams can now predict major incidents and modulate how the incident is managed via automation. For example, you can define a threshold of events that will kick off the correct major incident processes if the event criteria matches what you know to signify a major incident. Event Orchestration doesn’t treat each event as a unique, separate instance. Instead, it uses events as the historical basis to make informed decisions about the state of the system over time.
This new approach differs from traditional approaches to kicking off major incidents via automation. Rather than looking at a single event as a signifier for a major incident, you can evaluate your system state more precisely by knowing what’s happened recently and how accurately the circumstances match previous major incidents.
2. Reactive automation
Many organizations lean on automated diagnostics or auto-remediation to give responders a head start. However, automation isn’t self-aware. It doesn’t know what has already been run against other events that recently came into the system. As a result, automation often attempts to run multiple times for similar events without actually providing any real insight or resolution to the problem.
Now, you can build automation that tracks whether diagnostics have been run and change automation paths based on the response. For example, if a diagnostic has been run recently for an Event Orchestration, the automation understands that it doesn’t need to run the diagnostics again. And, it kicks off an additional automation sequence such as auto-remediation.
This reactive automation (or automation intelligently triggering more automation) gives organizations more flexibility and control over when automation happens and what to do with the feedback from those sequences.
3. Dynamic automation
Organizations want to run automation in an informed fashion, targeting the exact failed application or infrastructure. However, if you can only access a single event, it’s challenging to know what part of your stack failed and run automation accordingly.
Event Orchestration allows you to extract and store information about which parts of your stack have had issues so you can enrich that information into future automation for more precise targeting.
For example, you can set a variable that extracts data from a payload. If the event payload matches a certain circumstance, such as a Kubernetes event, you can populate the node information. Then, you can create an automation sequence to identify and dynamically restart that exact failed node.
4. Self-configuring automation
When something fails, making a rough estimate of what went wrong isn’t good enough. Responders must have the right triage information immediately to determine root cause and jumpstart resolution.
In these instances, variables can help organizations get the right triage information immediately and pinpoint failure in a system with automation that self-configures throughout the response process. For example, when an event is related to a piece of infrastructure currently experiencing an issue, automation configures the rule and adds key context like notes.
This new capability makes automation within PagerDuty more scalable and surfaces information as quickly as possible. It reduces the time required not only to resolve incidents but to create and deploy automation across a complex technical ecosystem.
If you’re an existing PagerDuty AIOps customer interested in creating automation like this, watch this short how-to tour or Twitch demo by Principal Product Manager Frank Emery.
Not a PagerDuty AIOps customer? Try it today and build event-driven automation that will help you reduce toil and improve efficiency across the organization.