A Guide to Incident Communications
With so many of us relying on a collection of different digital tools and services each day—both in our personal and professional lives—we’re trusting these services will function correctly when we need them. However, whether it’s an app, website, or anything in between, it is no secret that incidents happen! Services may be slowed or experience unplanned downtime, user or application data can be lost, and a security breach may occur.
The key is that when these incidents do happen, they are handled quickly and clearly communicated to users in order to set their expectations and keep them happy.
What Characterizes an Incident?
An incident is an issue that causes:
- A negative impact to the service or customer (ie: disruption in service or a reduction in quality)
- A loss of data
- A breach in security
For example, have you ever received an email from a company alerting you that your credit card information may have been compromised after shopping at their online store? Or maybe, while browsing a website you were notified that functionality may be slow due to increased traffic? Those are just two examples of different incidents in which a company communicates the error to its users.
Incidents can range from something small, such as slower page speeds caused by site increased site visitors (i.e. an online shop with increased traffic during a big sale), to major incidents like a security breach that may compromise private user information or data. In both cases, alerting users of the incident, providing them with regular updates, and effectively resolving the issue can mean all the difference in maintaining the loyalty of your users and the integrity of your service.
An incident is not something that may cause an issue in the future. However, if your team notices something that can cause an incident in the future, it’s always a good idea to diagnose and resolve the concern before it becomes an issue. In this case, communication to the users is not necessary.
How to Manage Your Incident Communications
Incident communication is critical to building trust with your users while improving the credibility of your service. When an incident happens, quickly communicating the issue to your users shows accountability and a proactive approach to the quality and reliability of your service. Simply put: it shows your users you care, which ultimately builds trustworthy and happier customers.
1) Identify the Specific Incidents Your Service Might Face
Each service will face different challenges and issues. It’s important to know what constitutes an incident for your specific case, and be prepared for if and when one of these incidents inevitably occur.
We recommend listing any and all possible incident scenarios for your specific service. You can then create templates, pre-written updates, and runbooks that can be easily plugged in and sent out to users in case of an incident.
2) Create Clearly Defined Roles and Responsibilities
Setting up a team with clearly-defined roles and responsibilities will allow you to best identify, communicate, and resolve incidents with your service. Each team member should be well-trained in their role, and the whole team should be aware of what each role is responsible for so that there is a clear understanding of the process and who to escalate certain issues to.
Also, be sure to have backups for all roles! Incidents rarely occur when it is convenient to you and your team, so you’ll always want someone available to help at all times.
Some important roles we recommend include:
- Major Incident Manager: This role is responsible for assessing the severity of different incidents, tracking incidents (changes, decisions, fixes) and confirming final fixes, holding post-incident reviews, and determining whether a public post mortem is needed in each case.
- Communication Managers: Responsible for determining which communication channels will be used both for internal and external (users) communications, writing external communications, and sending timely communications during an incident. The Communication Managers are also responsible for writing postmortems as needed.
- Customer Support Lead: Responsible for handling all incoming support tickets. The Customer Support Lead works closely with other teams to ensure optimal and consistent communications.
Social Media Lead: Role is responsible for fielding and answering any questions from users on your social media accounts.
3) Set up your different communications channels (both internal and external)
This is where your team begins structuring how an incident is communicated internally (within your team) and externally (to your users). Internally, this will be how your team is alerted of new incidents with the right people prompted and aware of their responsibilities. Externally, this is how you’ll alert your users an incident has happened, provide ongoing updates, answer any questions they may have, and notify them of the final fix.
The most common communication channels include email alerts, social media, or a dedicated page website or embedded status plugin. Workplace chat tools such as Slack have become incredibly popular for internal communications.
4) Have Pre-Written Templates Ready to Go!
Timing is everything when handling your incident communications. Having pre-written templates can help your team quickly notify and update users of an incident without having to write completely new communications from scratch every time! This will likely improve as time goes on and new templates are created. Still, a template is just a template – be sure to update with specific details and timelines as needed in order to provide your users with the best possible information and expectations.
5) Communicate Incidents Promptly and Clearly When They Happen
When an incident happens, you’ll want to alert users right away and keep them updated throughout the duration of the incident.
Effective communication of an incident can be broken up into four main parts:
- Alert users that an incident has occurred. This is your first contact with users regarding an incident and should be initiated as soon as an incident occurs. If a user finds incidents on their own without any type of alert, they might develop distrust with your service and its reliability. And if your users are finding out about issues before you do, you have bigger problems.
- Provide users with regular updates throughout the duration of an incident. Your users will want to know what to expect and when the service might be back up and running correctly. Regular updates will show your users that your team cares and is working on a fix. Nothing is worse for users than hour-long gaps with no updates.
- Notify users once an incident has been resolved. It’s important to address the issue, what caused it, and how it’s been resolved. You’ll also need to notify your users if any additional steps are needed, such as to change a password or monitor their credit card in case of a security or data breach.
- Conduct post-incident reviews, and write public postmortems when appropriate.
Incidents happen! Don’t let them get the best of your team or your service’s reputation. Setting your team up with the right roles, communication channels, and clearly defined processes can help to effectively identify, monitor, and resolve incidents while better communicating them with your users. Learn how PagerDuty can help you with this by signing up for a 14-day free trial today.
Additional
Resources
Solutions Brief
PagerDuty for DORA
Solutions Brief
Financial Services Solutions Brief