• Home
    /
  • Resources
    /
  • Podcast
    /
  • The Unplanned Show, Episode 1: Damon Edwards Rages Against the Ticket Machine
PagerDuty image

The Unplanned Show, Episode 1: Damon Edwards Rages Against the Ticket Machine

The Unplanned Show, Episode 1: Damon Edwards rages against the ticket machine

In this, the inaugural episode of “The Unplanned Show”, Dormain talks to Damon Edwards about the “capacity conundrum” where everyone is working so hard, but everything takes too long and costs too much. We talk about the “coordination overhead” costs of getting unplanned work done, how generative AI is both adding complexity and offers to accelerate automating as much as you can, and four steps to creating capacity.

“Any unplanned operational work is just the same as an unplanned incident… it’s still the same thing if someone’s waiting and now somebody’s interrupted and chances are you’re going to interrupt more than one person.”

Watch the episode here.

References:

  • PagerDuty Ops Guides
  • SRE Handbook
  • Summary with support from chatGPT:
    The conversation features Damon Edwards, who currently works at PagerDuty. He was previously known for co-founding Rundeck, runbook automation software. Rundeck was eventually acquired by PagerDuty. Damon’s work focuses on expanding PagerDuty’s capabilities beyond just on-call notifications to a comprehensive platform for operations work. They discuss the challenges of coordinating complex systems efficiently and the concept of “toil”—work that is necessary but doesn’t add enduring value to the business.

    “It’s a problem of saying that all work is appropriate to go through ticket queues… as the world gets more complex, we’re just making bigger problems for ourselves.”

    The conversation delves into the evolving nature of infrastructure management in response to increasing complexity and speed in technology trends. They discuss how organizations need to adapt and think about their infrastructure more like developers, especially in light of generative AI introducing additional complexity. Damon emphasizes that relying solely on ticket-based workflows for all types of work is becoming increasingly costly and less effective. He highlights the challenges of collaborating effectively within ticket queues and the drawbacks of breaking up work into numerous smaller tickets. Damon suggests that organizations need to explore alternative approaches to handling work, as traditional methods are no longer sustainable.

    “The classic modes of working through queues, it’s just very expensive… I think now we’ve hit this tipping point where organizations who figure this out are going to find so much more capacity that they even know was there.”

    The conversation continues with a discussion about the challenges of ticket-based workflows, particularly in dealing with unplanned work. They highlight the need for constant hand-holding in managing tickets and ensuring that the work is properly addressed. The conversation touches on the idea that the real issue may lie in efficiently coordinating tasks in real-time rather than focusing solely on filling out ticket forms. Damon emphasizes that this problem often goes unnoticed because it hides in plain sight, with people only seeing individual processes rather than the cumulative impact on key experts. The lack of visibility into the bigger picture and the resulting friction leads to a reinforcement of the conventional wisdom that better planning and faster queue management will solve the problem. Damon contends that this approach has been tried for over two decades and has not yielded significant improvements.

    “It’s really insidious because you don’t see it from the mega view everybody sees it from the myopic views… they don’t see how it adds up.”

    The conversation delves into the challenges of organizational design and the need to reevaluate how work is handed off within processes. Damon emphasizes the importance of minimizing partially done work and reducing handoffs between different teams and experts. He highlights the necessity of automation to keep work off of human hands whenever possible. Additionally, Damon discusses the need to streamline human intervention when it’s required, focusing on early diagnostics and pinpoint escalations to reduce the number of people involved in incident response or project work. He draws parallels between unplanned operational work and incidents, emphasizing that both scenarios involve interrupting individuals and disrupting their normal tasks.

    “Now we’re not only talking about shifting left or compressing the timeline but now we’re talking about narrowing that blast radius of those escalations.”

    Damon discusses the importance of minimizing interruptions and handoffs in operational workflows. He emphasizes the need for organizational changes to achieve this goal, but also acknowledges that starting with automation can be a significant step forward. Damon encourages the creation of self-service interfaces to reduce the need for manual ticketing and handoffs between teams. He highlights the value of automation in improving response times and increasing operational capacity. Damon also introduces the concept of generative AI as a tool to assist in automating tasks, providing a co-authoring experience for automation workflows. He explains how generative AI can help bridge the gap between expert knowledge and those less experienced, making complex tasks more accessible to a wider audience.

    Damon emphasizes the potential of generative AI in assisting with automation, envisioning it as a virtual expert working alongside individuals to enhance operational efficiency.

    “It’s like having that super smart, just because they’ve been around the block person next to you at all times. If you want to personify [generative AI] I think that’s kind of where this is going.”

    In the final part of the conversation, Damon emphasizes the challenge of increasing workloads while maintaining the same number of resources. He mentions the need for a shift in mindset from the “Go-Go times” of hiring more people to a focus on doing more with the existing team. He highlights the importance of minimizing coordination costs in operational workflows, which involves enabling people, removing the need for coordination, and providing the right context to guide decision-making. Damon also mentions the relevance of research in safety sciences, which addresses coordination costs in high-consequence domains, and how it parallels the challenges faced in operating large-scale internet systems. Damon suggests that interested individuals can explore resources provided by PagerDuty, such as their operations guides, as well as refer to foundational texts like the SRE books for further insights into revolutionizing operations.

    “I think PagerDuty… is really starting to amp up our mission to help revolutionize operations again.”


    "The PagerDuty Operations Cloud is critical for TUI. This is what is actually going to help us grow as a business when it comes to making sure that we provide quality services for our customers."

    - Yasin Quareshy, Head of Technology at TUI

    Top 50 Best Products for Mid-Market 2023 Top 50 Best IT Management Products 2023 Top 100 Best Software Products 2023