Community

Alerting, Announcements, Community, Partnerships

Give Silent Failures a Voice with Dead Man’s Snitch and PagerDuty

Don't let the hardboiled-sounding name of our latest integration scare you off, because this monitoring service is a great way to get notified when one of your mission-critical scheduled tasks suddenly sleeps with the fishes. Dead Man’s Snitch is an uptime-monitor for cron or periodic jobs like backups or batch processing, and it alerts you when your jobs don’t run so you can investigate before it becomes a problem.

3 min read

Announcements, Community, Events

Make Some Noise with Our Custom Alert Sound Contest @pagerduty @AWSreinvent #PickYourPage

We’re excited to announce our first-ever custom alert sound contest! Beginning September 21, 2015, we will accept submissions for a chance to be included as an alert sound in our mobile app. We have a great community, and we want to see them get creative. Or ironic. Or immature. Songs, clever noises, avant-garde recordings of one hand clapping - all are welcome. Send your best creation to pickyourpage@pagerduty.com.

2 min read

Community, Events, ITOps & Modern Ops, Operations Performance

Three Ways to Ramp Up Your Enterprise IT Operations Management

As indicated in a survey conducted by Forrester Research, a well-constructed IT Operations management system provides fast alert notification, keeps business-critical incidences from occurring at a minimum, and focuses on automation as a way of addressing issues. What we are actually seeing in the field today, however, doesn’t seem to line up with this approach. According to a recent Forrester thought leadership paper, incident resolution practices today are tactical, reactive, and harm commercial success. Listed below are some observations we are seeing with IT Organizations in the Enterprise.

2 min read

Alerting, Announcements, Community, Features, On-Call Life

It's a Match! Swipe Incidents with PagerDuty Mobile App Update

We're pleased to announce our fourth major mobile release, which brings some significant improvements to the performance and usability of key parts of the app. With all these changes, it’s faster and easier than ever to see, investigate, and take action on problems in your system — driving down resolution time and helping your team improve your operations performance.

2 min read

Alerting, Community, On-Call Life, Partnerships

How Etsy Drives a Culture of Empathy, Autonomy, Learning

Etsy occasionally runs an engineer exchange program, where they trade engineers with another tech company to give both organizations insight into what the other does differently. PagerDuty was their most recent participant, and in May, I had the pleasure of spending a week at Etsy’s office in Brooklyn. I learned from their practices, observed what they were doing well, and gained insight into their team dynamics. Etsy has an amazing culture, and I observed the customs they put into place to maintain their environment of empathy, autonomy, and learning. It was a great example of the traditions a company can foster to maintain a productive and happy work environment.

6 min read

Announcements, Community, Partnerships

PagerDuty + Opsmatic = Faster incident resolution

Opsmatic provides real-time visibility of any change to the live state of your infrastructure and intelligently alerts you before trouble begins. The recent addition of Assertions gives you a precise way to check and enforce policy across all your hosts. It’s only natural that Opsmatic has partnered with PagerDuty to ensure flawless alerting and effective incident collaboration. PagerDuty’s operations performance platform ensures that the right people on your team get alerted and can resolve incidents before they become emergencies.

2 min read

Community

What is Operational Maturity?

Long-time PagerDuty customers Dropbox, Flipboard, and Splunk spoke about their hard-won experience, shared war stories, and discussed what they’ve learned about operations at scale. They also had advice about how what they’ve learned can be applied to other teams. We were delighted to talk with customers, partners, and the extended community about what it means to be operationally mature. Here is what was said about Operational Maturity.

4 min read

Alerting, Community, ITOps & Modern Ops, On-Call Life, Operations Performance

Customer Perspective: Setting Up IT Operations Software for Startups

This is a guest blog post written by Anthony Gibbons, the Operations Manager at Airhead Education. Anthony gives his perspective as a startup setting up PagerDuty as their IT Operations Software: "With the advent of cloud services and companies willing to integrate with each other, it is now entirely possible for a small startup to use the same monitoring tools as industry stars such as Airbnb, Pinterest and Path... It probably took me an hour to integrate all of my services with PagerDuty."

5 min read

Announcements, Community, Partnerships

CloudMonix and PagerDuty Join Hands for Next-Gen Cloud Monitoring

With CloudMonix’s core objective of simplifying, streamlining and automating routine or complex tasks for Cloud System Administrators and IT Professionals – we are always on look to improve the way we deliver our services. That’s why we have partnered up with PagerDuty, to deliver instant alerts and notifications on PagerDuty’s leading Incident Management platform.

4 min read

Alerting, Community, Operations Performance, Reliability

The Discovery of Apache ZooKeeper’s Poison Packet

ZooKeeper, for those who are unaware, is a well-known open source project which enables highly reliable distributed coordination. It is trusted by many around the world, including PagerDuty. It provides high availability and linearizability through the concept of a leader, which can be dynamically re-elected, and ensures consistency through a majority quorum. The leader election and failure detection mechanisms are fairly mature, and typically just work... until they don't. How can this be? Well, after a lengthy investigation, we managed to uncover four different bugs coming together to conspire against us, resulting in random cluster-wide lockups. Two of those bugs laid in ZooKeeper, and the other two were lurking in the Linux kernel. This is our story.

15 min read

Alerting, Announcements, Community, Events, Partnerships

Cut Your Resolution Time with AppDynamics and PagerDuty

Application Performance Monitoring (APM) systems like AppDynamics can provide incredibly rich information about what’s happening with your IT infrastructure, and can identify performance issues before they create big problems. However, this information is only as good as your ability to respond to it. PagerDuty can extend the capabilities of AppDynamics Alert & Respond policies to ensure incidents are noticed, responded to, and fixed quickly.

2 min read

Community, Events, On-Call Life

PagerDuty User Group

We hosted our first user group last week at PagerDuty HQ! Not only did we gather our awesome customers and enjoy the taco bar and cervezas, but we got to learn a lot from our them, share our roadmap - and our customers learned from each other, too. We really value user feedback as part of how and why we build our product. We wanted to share some key takeaways from our sessions during the event.

2 min read