What is a Runbook?
Life in operations is a mix of planned and unplanned work. Often we come up against incidents or tickets where we don’t know the solution off hand. Finding a fix could mean a quick Google search, looking through the company wiki or docs, looking at locations for shared scripts, asking a coworker, or escalating the issue to a different department. We could spend hours trying to solve an issue, and pushing forward a solution that may or may not be the company’s best practice.
This is where runbooks come into play. A runbook is an actional process that is implemented when these common issues and tasks occur, in order to provide the operator with standardized, detailed instructions for quickly and effectively solving the solution.
What is a Runbook?
A runbook is a detailed “how-to” guide for completing a commonly repeated task or procedure within a company’s IT operations process (e.g. provisioning, software updates / deployment, change configs, and opening ports). Runbooks are created to provide everyone on the team—new or experienced—the knowledge and steps to quickly and accurately resolve a given issue. For example, a runbook may outline routine operations tasks such as patching a server or renewing a website’s SSL certificate.
Think of a runbook as a recipe. It provides detailed instructions for completing a specific task in a quick and efficient manner based on previous experiences with resolving the issue. Runbooks allow more experienced members on the team to share their knowledge so newer or more junior members can more easily resolve commonly faced issues themselves. It also means all team members can quickly refresh their memory and follow detailed steps without having to memorize countless individual procedures.
When Should Runbooks be Used?
Runbooks are extremely helpful for incident response operations. By creating runbooks for specific incidents, there becomes a shared wealth of knowledge and expertise that would otherwise be kept solely in the heads of experts. With detailed, updated runbooks, there is less need for escalation and companies can often function with smaller on-call IT teams.
Runbooks can also be used for day-to-day IT operations activities like regular maintenance of IT systems and applications. For example, a runbook can outline common tasks such as creating database backups or updating access permissions.
A runbook can also be:
- Manual: Step-by-step instructions followed by the operator
- Semi-Automated: A combination of operator-followed steps with automated steps
- Fully-Automated: All steps are automated and require no operator
Once a runbook is created, it should also be constantly updated to ensure it is the most effective solution. Runbooks should always contain the most up-to-date information and account for any new methodologies within a company’s operations.
The best and most effective runbooks are those that are constantly evolving with product and process changes, as well as easily adaptable to new rollouts.
What is the Difference Between a Runbook and a Playbook?
In the IT world, runbooks and playbooks are often confused with one another. However, they are actually quite different. A playbook deals with the overarching responses to larger issues and events, and can include multiple runbooks and team members as part of the complete workflow.
Going back to our previous analogy, if a runbook is a recipe, then the playbook would be the guidebook for hosting a given social event. The recipe is needed to effectively cook the meals, but the food is just one aspect of the entire event.
The playbook accounts for the big picture while the runbooks help outline smaller individual tasks.
Creating a Runbook Template for Your Company
Step 1: Planning a New Runbook
When planning a new runbook, it’s important to consider two things:
- What are the most common incidents or tasks your team faces?
- What have been the best solutions for effectively handling these in the past?
Taking a look at detailed incident reports and post mortems can show you some areas in your processes where a runbook can be effectively implemented. You can also look at your ticketing system to see where there are common, recurring tasks assigned to your team. Adding runbooks for commonly recurring tasks or issues will help increase the overall speed of your operations and ensure accuracy and efficiency.
For example, if your team is regularly having to renew a website’s SSL certificate, a runbook for that task would provide the operator with detailed instructions for completing the task correctly and with optimal speed. A runbook can even be fully automated to require no operator (such as running a website audit, etc.).
Once you’ve identified a task where a runbook could be established, it’s important to find and document the optimal solution. Take a look at the same incident reports and post mortems to see how this task has been resolved in the past, and which of those ways is the most efficient and accurate. Oftentimes, an expert can provide useful information based on their past experiences handling certain issues. In this case, document what they consider the best practice for resolving the issue or task. The runbook should include the agreed upon, best possible solution and present it clearly for the operator.
Step 2: Write Your Runbook
Once you’ve determined the procedure for your runbook, you can begin documenting it. There are a few things to remember when creating your new runbook:
- Keep it clear and simple – leave out unnecessary details
- Use documentation language that is easy to understand and follow
- Make it specific and unique to your processes
- It should be flexible and adaptable to changes in your systems and applications
Your runbooks should also be consistent across all applications. Make sure they are each structured in the same way, and provide the operator with all the needed details. For example, make the naming and headers consistent.
Once you’ve completed the runbook, it’s important to field test the documented process and make any updates or changes as needed.
According to Tom Limoncelli, an author and ex-Google sysadmin, there are seven important sections that each runbook you create should have:
- Service Overview
- Service Build Information
- Instructions for Deploying the Software
- Instructions for Common Tasks
- “Pager Playbook” (An outline of every possible monitoring system alert and step-by-step instructions for when they are triggered)
- Disaster Recovery Plans
- Service Level Agreement
You can read more about these seven sections here on Tom’s website.
Step 3: Test, Update, and Improve Your Runbooks
Once a runbook is created, it’s not just set it and forget it. Runbooks should be constantly tested and updated to ensure its functioning at optimal levels, even as your systems or applications change. A runbook is best when it is flexible and easily adaptable to the ever-changing environment of IT operations.
You can automate your runbooks using PagerDuty Runbook Automation. To learn more about how PagerDuty can help implement efficient processes like runbooks and runbook automation, contact your account manager and schedule a demonstration or trial today.
Additional
Resources
Webinar
Getting Started Workshop: Rundeck By PagerDuty
Webinar
Getting Started with the Rundeck Ansible Integration