The automation of building and testing code with CI/CD enables us to ship code frequently with a high level of trust that bugs won’t impact end-users. Why, then, are our CI/CD systems still often painfully slow, unreliable, and our ability to deliver frequently blocked?
Site Reliability Engineering (SRE) aims to reduce the pain caused by unhealthy platforms and processes that affect the reliability and stability of production systems.
Join Buildkite’s Mel Kaulfuss as she looks at CI/CD through the lens of SRE.
In the session, you’ll learn how to bring SRE principles and practices to CI/CD, including:
- Defining meaningful SLOs (service-level objectives) and SLIs (service-level indicators)
- Observing system performance and metrics
- Using error budgets to tune your test suites and pipelines
By managing your CI/CD infrastructure and processes as you would your production systems, with an SRE mindset, you’ll be able to respond quickly when things go wrong and reclaim control over large, slow, and unreliable build and deploy processes.
"The PagerDuty Operations Cloud is critical for TUI. This is what is actually going to help us grow as a business when it comes to making sure that we provide quality services for our customers."
- Yasin Quareshy, Head of Technology at TUI