A Security-First Culture for Better Cloud Security and Faster Incident Management
Last month, PagerDuty and Threat Stack came together in Seattle to co-host a workshop — “Incident Management in the 21st Century”. As Threat Stack’s Senior Director of Operations and Support, I had the pleasure of joining PagerDuty’s VP of Product, Jonathan Wilkinson, to talk about cloud security and incident management.
The common thread across both of our talks was how companies can quickly and effectively notify the right people when an important incident occurs by getting ops, security and dev teams working more collaboratively together.
You see, much like how teams are notified when their website goes down, the same process applies when their website is hacked. In today’s new world of security, organizations can effectively apply their existing PagerDuty alerting processes to security-related issues. And the very best way to enable this is by building a security-first culture that allows for effective security alerting, escalation and collaboration. This is actually one of the key reasons why PagerDuty and Threat Stack recently announced an integration that enables customers to manage cloud security incidents, such as user logins, suspicious processes running and configuration changes, within PagerDuty.
A point we all agreed on during the workshop was that, considering today’s fast-advancing threats, a security-first culture is no longer a nice-to-have, it’s a need-to-have. At Threat Stack, we decided very early on to adopt a security-first culture by integrating our security, ITOps, and DevOps teams. Here are a few things that we learned are effective in building this culture, and a look at how we made it happen.
The Best Security Culture is Collaborative, Not Prescriptive
Getting your entire team to understand security and incident response is no small undertaking. ITOps and DevOps teams are now faced with questions that they have never had to deal with before— ones that were previously considered only “security team” problems.
Here are a few of the most frequently asked:
- How do we become more proactive about application security?
- How do we build “secure by default” systems?
- What are the best tools to use to accomplish all of this?
- How do I notify the the right people, with the right information in the shortest time possible?
Security should be as integral and collaborative as possible — not rigid and prescriptive. But what if you don’t have a dedicated security team? What if you are a small, resource-constrained startup? The reality is that when it comes to security, we are all suffering from the same time and resource constraints regardless of team size, and we all want the same thing: better security.In fact, adopting modern security policies is not much different than adopting DevOps practices. By embedding monitoring into daily workflows, security, DevOps and ITOps teams can gain the deep insights into services, users, and activities that they need to ensure application security. Many companies today already have experience integrating DevOps methodologies — GE Capital, Macy’s, Target, and Nordstrom were some of the very first. Faced with some pretty powerful and entrenched silos, the success of DevOps within each of these companies all came down to enabling collaborative cultures. The same principles can be applied to implementing a security-first culture.Another great example is the three-year transformation of the Twitter Infosec function, which started when the @BarackObama account was hacked. It’s an incredible story of how they integrated security into the daily work of dev and ops teams, with the primary mission of not getting in their way.
Security Processes and Visibility Go Hand-in-Hand
Over the past few years, I’ve talked with many Threat Stack customers who require more security visibility into the activity on their systems. Similarly, a lot of operations people want this too — insight into what’s happening on their stack. But all the insight in the world won’t help them if they don’t have defined processes to handle events.
Many people say, “It must be easy for you. You can just use Threat Stack internally.” While this is true, and we do, the fact is that if an internal culture and process aren’t setup to support the data that tools like Threat Stack and PagerDuty provide, you might as well not even have them (although, admittedly, we hope you do). So it’s safe to say that attempting to shoehorn antiquated tools (NIDS, anyone?) into a more modern cloud-native world would bring unnecessary pain to organizations.
By developing a security-first culture, including the workflow the PagerDuty and Threat Stack integration provides for incident management and incident resolution, we were able to reduce our response time by automating alerts and the data around security events. Getting the right people together as quickly as possible to assess the situation was a huge part of reducing our time-to-resolution. This let us focus on responding and resolving, instead of being stuck in the weeds.
How Threat Stack Implemented a Security-First Culture
Alert Escalation with PagerDuty
We are huge PagerDuty fans at Threat Stack, and personally I’ve used PagerDuty since the year they launched. People often consider using PagerDuty when their website crashes or their database goes offline, but PagerDuty is more than just for that, it’s an amazing tool that we use at Threat Stack everyday for production alerting, health checks, analytics and some other cool custom security use cases. We consider it to be one of the most important and valuable tools we have internally.
There are a number of ways PagerDuty helps us at Threat Stack to implement collaborative and high-visibility security processes:
- Notifications from production systems with high alert severity levels
- Scheduling and overrides
- Per-service escalation groups
- Suspicious logins or processes running
The infernal sad trombone wakes up our team — even in the dead of night — to let us know there is trouble in the cloud. We have a love/hate relationship with that infamous alert sound; it keeps us on our toes at all hours of the day so that we, as a team, can have visibility and response processes in place to keep our applications and systems secure.
Open Communication with Slack
Another process we implemented was integrating our security alerting into Slack — the same chat system we use for everything else internally. Now when a security alert fires, Threat Stack drops a message into a designated Slack channel so that everyone in the channel has visibility and can respond. What’s really great about this is that we can have a real-time conversation on Slack about the event taking place. Was some code pushed? Was that an accidental failed login to a host?
For the Threat Stack team, the most valuable feature of PagerDuty is that it gives people a place to start a conversation and say something if they see something. We can then use Slack to acknowledge, discuss and hopefully dismiss quickly.
Getting a process in place — even something simple — is better than nothing. From there, teams can adjust their process at any time, especially as the company grows and evolves. In all of the previous examples, we needed tools to implement “secure by default systems”, but also we needed a company culture that enabled anyone to call out security issues and raise awareness.
I encourage all companies to begin implementing a similar approach so we can demonstrate better cloud security practices as a community.