Best Practices & Insights

Alerting, Best Practices & Insights, Operations Performance

On-Call Best Practices: Page Your Manager

Having one person on-call isn't enough. What happens if your on-call engineer sleeps through their alert? What happens if their phone's battery dies without them knowing, or if they get an alert at a really inconvenient time, like when stuck on a bus or in traffic? It will happen. We present best practices for back up. One or more people, waiting in the wings, ready to spring into action if your primary on-call is unable to perform his or her duties to the best of their abilities at any given time.

Alerting, Best Practices & Insights, DevOps, On-Call Life, Operations Performance

The Best Metrics for Driving Cultural Change in DevOps Teams

Everyone wants to optimize their team’s performance, but coming up with a good plan for doing so isn’t always easy. That’s why operationally mature DevOps teams use metrics to gain valuable insight into their work, enhance the their capacity, and drive cultural change. Here we outline the key metrics that you should be monitoring and talk about how they can influence your team’s culture and performance.

4 min read

Alerting, Best Practices & Insights

Best Practices in Outage Communication: Internal Stakeholders

When you’re in the middle of an outage, the last thing you want is people from all over the company constantly asking you when it’s going to be fixed. Your job is busy enough without having to play translator and communication whiz when you have more important things to be worried about. But at the same time, your outage affects people outside of your team. You can’t neglect communicating with internal stakeholders like your manager, or your CTO, or your CEO, or your marketing department, or you customer support team. You see where I’m going with this. So how do you keep your internal stakeholders informed in a timely, efficient fashion?

4 min read

Alerting, Best Practices & Insights

Best Practices in Outage Communication: Incident Team

You’ve just realized that something has gone critically wrong, and you can’t fix it yourself. Particularly if you work within a collaborative DevOps environment, it’s better to get by with a little help from you friends. Effectively coordinating the incident response across subject matter experts and front-line responders is a secret to operational success that differentiates top teams. So it’s important that you have an effective and efficient way to to sound the alarm, and make sure that your conversations are recorded and actionable.

4 min read

Alerting, Best Practices & Insights, Reliability

Best Practices in Outage Communication: Customers

Outages are chaotic, and it can be difficult to figure out the best way to let your  customers know what is going on. One of the first big decisions you’ll need to make is whether you’re going to respond only to people who inquire about the issue, or if you’re going to be more proactive and post updates publicly. Many of the leading technology companies have begun to transparently discuss outages with their customers, and there are a number of good business reasons for doing so. Regardless of your approach, here are 6 things you can do to ensure successful customer communication during outages.

8 min read