Monitoring is an ancient discipline—but one that has evolved significantly in the past few years. Modern monitoring platforms collect a lot of data from our systems: work and resource metrics, events that are happening inside and outside our applications, distributed tracing data, real user monitoring, and more.
But are we using all that data in a way that helps to avoid outages without causing alert fatigue? Are we suffering from information overload in our monitoring systems? We’ll present strategies on how to organise your system data in a way that helps your teams anticipate future user-facing issues and avoids alert fatigue by paging only when immediate attention is required.
"The PagerDuty Operations Cloud is critical for TUI. This is what is actually going to help us grow as a business when it comes to making sure that we provide quality services for our customers."
- Yasin Quareshy, Head of Technology at TUI