Written
FinServ
PagerDuty Helps CTC Transform Operations in a Remote World
Founded in 1995, Chicago Trading Company (CTC) is a derivatives trading firm that specializes in market trading across a variety of products, services, and strategies. CTC actively trades in a broad spectrum of asset classes, including equities, interest rates, and commodities. Its trading desks are open 20 hours a day, six days a week, and the company is recognized as a leading provider of liquidity and pricing on numerous equities and derivatives exchanges around the world.
Because the market fluctuates by the microsecond, CTC’s critical applications and services need to always be online and available for users in a moment's notice to deliver a consistent customer experience, every time. “With our services directly tied into the open market, downtime is just not an option,” explained Luke Rotta, Manager, SRE and Observability at CTC. “If we’re not in the market, we’re not participating in the opportunity—and it’s a missed opportunity.” Rotta is responsible for managing observability at CTC, as well as overseeing the SRE team that supports, automates, and improves uptime for the pre-production and production environments.
Before PagerDuty
Before implementing PagerDuty, Rotta’s team experienced several challenges, including:
Delays in response stemming from a manual on-call directory with outdated schedules and rotations
Difficulty communicating with on-call responders during non-business hours
Lack of automation embedded into the response process, which led to more manual work for on-call responders
A legacy dashboard cluttered with unactionable events and alerts, creating delays in incident acknowledgement and resolution
Alert storms that reduced the ability for teams to understand the makeup of, and respond effectively to, incidents
With the recent push towards remote work, CTC was forced to quickly pivot operations to a digital-first model. Additionally, heightened market volatility meant that its customers also increased the frequency of their trading, making it more important than ever that the CTC trading platform stayed up and running at all times.
To help achieve this, CTC needed to rethink its incident management process while continuing to maintain and deliver a consistent customer experience. This meant Rotta’s teams needed to refocus their efforts on day-to-day operations rather than long-term projects—and all in a new, remote-first environment. “Our teams are laser-focused on making sure systems can handle the increased capacity and deliver liquidity to the marketplace to keep our customers happy,” shared Rotta.
Prioritizing Communication and Collaboration
Before going remote, most information was communicated verbally in the office. Now, with the entire company working remotely, the ability to effectively communicate and collaborate across teams is more important than ever. PagerDuty helped CTC transform its incident communication channels to be completely digital. “PagerDuty really taught us to spin up an incident remotely and allowed us to centralize our incident management process to quickly assemble teams into a single channel and make decisions directly from there.”
CTC also leverages Slack, part of PagerDuty’s ecosystem of over 600+ integrations, to improve incident communication and collaboration between teams, as well as for conducting postmortems. With the Slack integration, teams can create, respond, and resolve PagerDuty incidents directly inside the Slack interface, which alleviates the stress of multiple communication channels and allows all necessary teams to work through the incident together. “Since all teams are remote now, we just create the incident directly in Slack. The playbook tells everybody what Zoom room to jump into, and off we go,” shared Rotta.
Improving Operational Visibility
In a digital-first environment, it’s critical for stakeholders to have total visibility into the health of their critical systems and services in real time so they can quickly orchestrate a proper response when an incident occurs.
Before PagerDuty, CTC used a traditional dashboard that would alert the team about service disruptions and incidents. “We would get what we call the ‘wall of red,’ which was quite literally a screen filled with hundreds of alerts, with no sense of what’s being impacted or what’s going on in our environment,” explained Rotta.
To combat this issue, CTC implemented PagerDuty Event Intelligence to automatically group alerts together and cut down the noise for all mission-critical services and applications. “Before PagerDuty, we sometimes had 50-200 alerts coming in at once. With Event Intelligence, that number is now down to 5-10,” explained Rotta.
With Event Intelligence, CTC’s response teams also have the context they need to quickly resolve an issue before it becomes widely customer-impacting. “The ability to reduce the noise and clear out alerts within the platform really frees up a lot of time for people on our SRE team to focus on higher-impact tasks,” said Rotta.
Like many companies today, CTC needs to continue scaling to keep up with customer demand and new innovations. Even though speed is table stakes at a trading firm such as CTC, running non-latency-sensitive workloads within AWS has given CTC the ability to scale quicker and reduce time to market for ideas. Many of the new services deployed to AWS follow a you-build-it, you-own-it approach and PagerDuty provides a single way to escalate, track, and measure incidents across the company regardless of who owns or supports the service.
“The ability to reduce the noise and clear out alerts within the platform really frees up a lot of time for people on our SRE team to focus on higher-impact tasks.”
- Luke Rotta, Manager, SRE and Observability, CTC
Benefits With PagerDuty
Since implementing PagerDuty, CTC has seen several benefits, including:
Reduced alert fatigue and improved incident response with PagerDuty Event Intelligence
Faster mean-time-to-acknowledge/mean-time-to-respond (MTTA/MTTR) across all critical systems and services
Improved day-to-day incident management and the ability to automate the hand-off of incidents from shift to shift
An open line of communication with senior traders on the floor that escalates incidents to on-call managers across time zones when needed
Seamless incident management experience for 24x7 applications running on AWS
PagerDuty also helped support CTC’s business continuity strategy. “In this new, remote environment, employees can feel disconnected from what's going on, and we're trying to solve that with PagerDuty. Almost everyone at the company is on the PagerDuty platform, whether they’re a stakeholder or a full user,” shared Rotta.
Future Looking
CTC plans to continue expanding its use of PagerDuty across the organization. For example, the company has decided to focus more on metrics to inform future actions, so Rotta’s team is looking into Operational Reviews, as well as PagerDuty Analytics and Intelligent Dashboards, to help better understand team health and the business impact of incidents, measure SLAs, and gain the ability to seamlessly share metrics with executive leadership. “This could help drive decisions around what applications we need to invest in,” explained Rotta.
Additionally, while CTC already has all of its major business services set up in Status Dashboards, the company is looking to extend its use across the company by providing executive leadership improved visibility into the status of an incident or a service. As the PagerDuty platform grows with CTC, Rotta and his team look forward to extending the platform's functionality across other parts of their infrastructure. “I like that it’s simple. I don’t have to manage anything because it just does its job,” he shared.
To learn how PagerDuty can help your team make things simple and transform operations in a digital-first world, contact your account manager or try a 14-day free trial today.