Reliability

Reliability

Pressure Release Valves

This is the fourth in a series of posts on increasing overall availability of your service or system. Have you ever gotten paged, and known...

John Laban

5 min read

increasing-availability, john, MTTR, reliability

Operations Performance, Reliability

What monitoring tools do you use?

We support any monitoring tool that can send an email or make a JSON call, but we support tighter integration with some than others. We...

David Hayes

1 min read

data, reliability

Reliability

A Standard Operating Procedure for when s*IT hits the fan

This is the third in a series of posts on increasing overall availability of your service or system. In the first post of this series, we...

John Laban

3 min read

increasing-availability, john, MTTR, reliability

Reliability

More control over Optimistic Locking in Rails

Like pretty much everything else in Rails, optimistic locking is nice and easy to setup: you simply add a “lock_version” column to your ActiveRecord model...

John Laban

2 min read

Code, john, Optimistic Locking, reliability

Reliability

Availability lessons from shoe companies and ancient warlords

This is the second in a series of posts on increasing overall availability of your service or system. In the first post of this series,...

John Laban

6 min read

increasing-availability, john, MTTR, reliability

Reliability

Outage Post-Mortem

As you may already know, PagerDuty suffered an outage of 30 minutes yesterday, followed by a period of increased alert delivery times. We’re taking the downtime...

Andrew Miklas

6 min read

reliability

Reliability

If an asteroid strikes PagerDuty

Updated on 9/21: We have replaced Twitter with our status page as a communication method. At PagerDuty we strive for 100% uptime, and it is a...

Baskar Puvanathasan

3 min read

reliability

Reliability

Standing on the shoulders of giants and stumbling with them – the Amazon AWS outage’s "pain" statistics

Today, at around 1am Pacific Time, Amazon began having major problems with some of their cloud infrastructure: specifically with their EC2, EBS, and RDS offerings. We'd like to share some statistics on the alerts we sent out - via phone or SMS - during the outage.

John Laban

3 min read

john, reliability

Reliability

The ups and downs of Availability

This post is meant as a quick introduction to some concepts of system availability, so that subsequent posts in this series make sense. I'll go over concepts like availability, SLA, mean time between failure, mean time to recovery, etc.

John Laban

4 min read

increasing-availability, john, MTBF, MTTR, reliability, SLA

Reliability

Fixing The Back Button: AJAX History And Bookmarks

We've added deep linking to the incidents table. The browser will now remember all your interactions with the table as you move throughout your account or recall your bookmarks.

PagerDuty

2 min read

reliability

Reliability

Load Balancers need static IPs!

We’ve been hosting PagerDuty on AWS for about the last year. One of the biggest draws to the platform for us was the promise of ready-built components...

Andrew Miklas

3 min read

reliability

Gestion des incidents

AIOps

Automatisation

Opérations de service client

Pages de statut

Communication avec les parties prenantes

Intégrations

PagerDuty Advance

Plateforme pour développeurs

Services professionnels

Sécurité

Classe entreprise

Intégrations

Pressure Release Valves

What monitoring tools do you use?

A Standard Operating Procedure for when s*IT hits the fan

More control over Optimistic Locking in Rails

Availability lessons from shoe companies and ancient warlords

Outage Post-Mortem

If an asteroid strikes PagerDuty

Standing on the shoulders of giants and stumbling with them – the Amazon AWS outage’s "pain" statistics

The ups and downs of Availability

Fixing The Back Button: AJAX History And Bookmarks

Load Balancers need static IPs!