Triage PagerDuty Alerts Using Loggly
Guest blog post by Jason Skowronki, product manager at Loggly. Loggly is the world’s most popular cloud-based log management service with over 3,500 active customers and developers and system administrators troubleshoot problems, monitor system status, and proactively address issues with alerts.
You’re out to dinner with friends and you receive an alert through PagerDuty. Your signup rate has dropped way below its usual level. This could indicate a serious problem with your site, but it could also just be an unusual traffic pattern. Should you leave the restaurant and rush home? Or would you just be sacrificing much-deserved downtime for something that could wait until tomorrow?
Alerting is critical to the 24×7, net-centric economy. It’s a way to minimize the impact of application problems on revenue and profits. At Loggly, we love PagerDuty because it has brought sanity to how we become aware of operational problems, assign the right resources to solve those problems, and follow them to completion. It answers the all-important “who” questions and is a perfect complement to the Loggly service, which gives DevOps teams a way to delve into the “why.”
Triage and Find the Root Cause of Problems Faster
Let’s go back to our interrupted meal. PagerDuty will tell you that an alert fired due to an unexpected decrease in signups. However, you need more information about what system they are coming from and who is responsible. You click on the alert and go straight to your Loggly dashboard, where you see that the alert fired at the exact same time that a deployment happened. So this is probably a real site problem. Time to get the check.
While you’re waiting, you search for your signup page logs. You see that clicks are being recorded but that calls aren’t consistently being sent to the back-end service. Later, some digging into the code shows you that the page isn’t rendering correctly in Internet Explorer browsers. You roll back the deployment, file a bug with the front-end team, and resolve the PagerDuty alert.
Loggly offers DevOps teams deeper visibility into their systems, both during initial assessment and triage and as they work to isolate and resolve their operational problems. Our powerful search and filtering, point-and-click charting, and dashboards help you make instant sense of tons of log data coming from applications, platforms, and systems. You can quickly see correlations between an alert state and other things happening on your systems, and you have access to all of the data you need to find root causes.
As a result, you can stop interrupting your day for small problems so you can focus on the big ones. And you can solve those big ones much faster.
Hang out with Loggly, New Relic and PagerDuty tonight at DataBeat’s reception and have some Data-tinis on us!