Chef Testing at PagerDuty
At PagerDuty, all of our computing infrastructure is automated using Chef. We push out features and changes to our Chef codebase very frequently – often...
At PagerDuty, all of our computing infrastructure is automated using Chef. We push out features and changes to our Chef codebase very frequently – often...
High-frequency trading accounts for 50% of US’ security trading. With thousands of securities totaling millions of dollars traded every millisecond, robust and reliable computer systems...
This is the first post of a multi-part series on some of the operations challenges that the team at PagerDuty is solving. At PagerDuty we...
PagerDuty’s July Hack Day presented another batch of amazing projects from our staff. One project in particular has a lot of future potential to provide...
We’re rolling out Webhooks on incidents and it opens up a lot of fun new things. For background, Webhooks let you recieve HTTP callbacks when interesting...
As a member of PagerDuty’s realtime engineering team, a top concern is designing and implementing our systems with high availability and reliability. On May 30,...
We spend enormous amount of our time on the reliability of PagerDuty and the infrastructure that hosts it. Most of this work is invisible, hidden...
On January 24, 25 and 26, 2013, PagerDuty suffered several outages. The events API, used by our customers to submit monitoring events into PagerDuty from...
You’re a techie working for one of the multitude of startups that rushed to market, where the founders hastily glued a Rails app together with candy-bar wrappers and...
A few weeks ago I had the privilege of speaking at Surge 2012 in Baltimore, MD. The audience were of those whose focus was on better...
This is a guest post by Connie Quach, Sr. Product Manager, responsible for the web performance products at Neustar. In today’s competitive environment, website performance...
Sometimes you just have to tinker. Experimentation, trial and error are all part and parcel of the learning experience, and the gateway to bigger and...
At PagerDuty, we usually get a front seat to anything that’s wrong with the internet. Last weekend, a derecho storm took out 7% of AWS...
On the evening of Friday, June 29th, Amazon Web Services (AWS) experienced a major outage at its North Virginia location due to a loss of...
We have some very exciting news for all of our customers who are running mission-critical systems on AWS in the US-East region: we have migrated...
On Thursday, June 14, starting at 8:44pm Pacific time, PagerDuty suffered a serious outage. The application experienced 30 minutes of downtime, followed by a period...
As some of you know, PagerDuty suffered an outage for a total of 15 minutes this morning. We take the reliability of our systems very...
As a general rule, whatever percentage you think your test coverage is, it isn’t. Whatever amount of the known surface area you’re covering, there’s going...