Blog

AWS Orchestration with Systems Manager & Runbook Automation

by Jake Cohen October 20, 2023 | 4 min read

“We have the automation, but it will need to be invoked separately for each account. 

Doable, but time consuming and error-prone. Oh, and only someone from SRE can do it.”

It is now the de facto standard for companies to operate across numerous regions and cloud-accounts.  The reasons for this vary, and depending on where you sit in the organization, these reasons may be more or less apparent to you:

  • Security: with more concrete barriers between environments, potential attackers have access to less volume of sensitive data.
  • Cost Isolation: by segmenting workloads and projects by account, it is easier to identify and restrict spend. 
  • Resource Ownership: with accounts dedicated to specific teams, more autonomy can be given to management within those departments. 
  • Experiments: teams can run experiments in cloud accounts that pose less risk to the organization because they are isolated from accounts that may contain sensitive business information and production workloads. 

While this is by no means an exhaustive list, it showcases the number of valid reasons that a given company will have multiple–if not numerous–cloud accounts. Having multiple cloud accounts lends itself to the benefits of operating in the cloud: security, flexibility and velocity.

But these benefits do come at a complexity cost and introduce new challenges of managing environments and operating workloads across multiple cloud accounts. Specifically, implementing standard procedures that involve interacting with the workloads within these environments becomes toilsome when multiple regions or cloud accounts are involved.

As it turns out, there is a large class of use-cases that involve this type of interaction:

  • Auditing & Compliance: inevitably, there are times when all compute instances need to be audited or investigated. Simple examples of this may be to find if a particular version of a software package is installed or if a specific user or service-account exists.
  • Config Management: organizations want to implement standard configuration settings for software and therefore need to run scripts or playbooks across all compute instances.
  • Incident Response: regardless of whether compute instances reside in multiple cloud accounts, teams want a standardized approach for pulling diagnostics or invoking remediation. This is especially true for software companies that offer a “managed deployment” option within their customers cloud accounts.

Again, not an exhaustive list, but highlights some of the more common use cases. Automating these types of tasks across multiple cloud accounts today requires operations engineers to either write scripts that handle the “orchestration layer” for connecting to each cloud account, or they “manually” invoke their automation (e.g.,. playbooks) within each account separately. Both of these approaches are either time-consuming or error-prone and can typically only be carried out by experienced engineers.

PagerDuty’s Runbook Automation solves this problem by serving as a self-service and orchestration layer on top of traditional automation tooling. For AWS specifically, Runbook Automation integrates with Systems Manager (SSM) to dispatch commands and scripts to EC2 instances across cloud accounts.

To accomplish this in a secure and scalable manner, Runbook Automation first integrates with a “central” cloud account using the industry-standard practice of having a cross-account External ID to integrate a SaaS vendor with an AWS account. Once integrated with the central account, Runbook Automation can use the Assume Role function of AWS IAM to adopt the IAM role in a separate (“remote”) AWS account where it then uses Systems Manager to execute commands and scripts on target EC2 instances:

Runbook Automation uses native IAM integration to automate across multiple cloud accounts.

Runbook Automation uses native IAM integration to automate across multiple cloud accounts.

This same use case can be accomplished with Process Automation (self hosted), where the software is installed in the master AWS account on EC2, ECS, or EKS. In this case, Process Automation “inherits” the IAM entitlements that are associated with the hosts that it is installed on:

Process Automation self-hosted in the “Master” account will inherit IAM entitlements directly.

Process Automation self-hosted in the “Master” account will inherit IAM entitlements directly.

This procedure that utilizes the Assume Role in a “remote” account can be implemented any number of times within a single Runbook Automation project (workspace), therefore providing a simple approach for dispatching automation to numerous AWS accounts

Combined with Runbook Automation’s native integrations and self-service interface, users can delegate pre-approved procedures to individuals with less technical expertise or those who should not have full access to all cloud environments.

By implementing Runbook Automation to orchestrate tooling such as AWS Systems Manager across accounts, not only is time saved but more individuals can leverage the power of automation throughout the organization.

To implement this solution with Runbook Automation, follow the steps outlined in this How To Article. A demo of this solution is shown here:

If you do not yet have a Runbook Automation account, sign up for a free trial here.