Some people inherit treasured jewelry, lavish mansions, or even grand fortunes. Software engineers inherit legacy code, written by people no longer working at the company, that is in need of more yak-shaving than a team of grizzly Stanley Cup winners after their long road to victory. Which will undoubtedly be the Toronto Maple Leafs this year.
See Brent Burns’ glorious playoff beard pictured below.
Ok, that might be an over exaggeration but my team (the software one, not the hockey one) found ourselves in a similar spot a couple weeks ago. We finished our latest project and were starting a new one which involved rewriting an old (5+ years) Scala service into Elixir. This service backed PagerDuty’s webhook processing system.
The decision to undertake this project was made for two main reasons:
- To help the company standardize on Elixir due to an org-wide decision made earlier this year.
- To give my team better visibility into how the service actually worked. Like most services that have gone through five years worth of changing business requirements, increased scale and developers coming and going, this code had become difficult to read and to maintain. What resulted was files resembling this:
Forgive me for the oozing sarcasm, it’s my way of coping.
This rewrite project had actually started a year ago but was put on pause. What resulted was a relatively small Elixir code base that my team could use as a starting point for our glorious rewrite. This code base wasn’t in nearly as bad a spot as our 5+ year-old code base but it was originally written by people who were quite new to Elixir (I’m including myself as one of those people).
It followed the convenient approach of spreading your business logic across lots of different parts of the application. This made it hard for new team members to get a big picture view of the service’s application logic without travelling to the deep, dark depths of Mordor to figure out that the code that expires stale webhooks is embedded in the Parser
module whose sole responsibility should be encoding/decoding webhooks to/from JSON.
Shortly before starting this project, my team was encouraged to take an excellent Elixir course that has unofficially become required training for all full-stack engineers at PagerDuty. This course is called “Elixir for Programmers” by Dave Thomas, aka The Coding Gnome (who I learned was also one of the authors of the timeless book, “The Pragmatic Programmer”).
One of the principles emphasized by the course is the importance of keeping your business logic separate from everything else, especially your framework. This really resonated with me, so much so that for my next Hack Day I refactored the Elixir code base mentioned above to adhere to this principle. What resulted, was one divine module which combined all of our previously scattered business logic into one clear pipeline that would call out to other modules.
Here’s what the rest of this post looks like:
- The Original Code
- The New Code
- An Aside About Elixir Pipelines
- Back to the New Code
- Did This Refactor Actually Help???
- Recap
The Original Code
Note that this code is a simplified form of our actual code to help illustrate the point of this blog post better. For more information about how PagerDuty webhooks actually work, see our dev docs and webhooks setup guide.
So, for this blog post, let’s assume that the webhook processing service has the following business logic:
- Receive webhooks from an upstream service that generated them in response to an incident being triggered/acknowledged/resolved.
- Parse the received webhooks from a JSON string.
- Check if the webhooks are older than a certain threshold (i.e. expired). If they’re expired, drop them.
- Send the webhooks to their endpoints via an HTTP POST request. These endpoints can be internal, customer-defined endpoints or external tools that PagerDuty integrates with like Slack, Jira and ServiceNow.
- Persist these webhook endpoint responses into a database (can be used to help debug customer issues).
If you’re like me and learn best through examples, here’s a real-life example of our PagerDuty Slack Integration. You can setup a webhook so that whenever an incident gets triggered/acknowledged/resolved, a notification gets sent to your team’s channel in Slack (so your whole team can know that it took you over 15 minutes to acknowledge that one incident last Friday night when you were “on-call” and “forgot” to bring your laptop with you to the bar).
Enough talk, here’s some code.
Note that all the code is available on Github.
Receiver
receives the webhook from the upstream service, extracts the webhook from the received JSON string via Parser
and then sends the webhook to its endpoint via Sender
:
defmodule Receiver do
def receive(raw_webhook) do
parsed_webhook = Parser.parse(raw_webhook)
case parsed_webhook do
{:ok, endpoint, payload} ->
# if it was parsed successfully, send it to its endpoint
Sender.send(endpoint, payload)
{:error, error_msg} ->
# if an error was returned from trying to parse it, don't send it
Logger.error("error - #{error_msg}")
end
end
end
Parser
decodes the webhook from a JSON string. It also raises an error if the webhook expired:
defmodule Parser do
@expired_threshold_seconds 5
def parse(raw_webhook) do
decoded_webhook = JSON.decode(raw_webhook)
endpoint = decoded_webhook.event.endpoint
payload = decoded_webhook.event.payload
time_since_created = decoded_webhook.event.created_at - Time.now
is_expired = time_since_created > @expired_threshold_seconds * 1000
case is_expired do
true -> {:error, "webhook expired"}
false -> {:ok, endpoint, payload}
end
end
Sender
sends the webhook to its endpoint. It also handles the different types of responses returned from the endpoint by persisting them to the db:
defmodule Sender do
def send(endpoint, payload) do
HTTP.post(endpoint, payload)
|> handle_response(endpoint, payload)
end
defp handle_response(response = {200, response_body}, endpoint, payload) do
Logger.info("webhook sent successfully")
DB.save_successful_response(endpoint, payload, response_body)
end
defp handle_response(response = {404, response_body}, endpoint, payload) do
Logger.error("failed to send webhook, endpoint returned 404")
DB.save_failed_response(endpoint, payload, response_body)
end
end
DB
stores the different endpoint responses in the db:
defmodule DB do
def save_successful_response(endpoint, payload, response) do
MySQL.insert('successes', endpoint, payload, response)
end
def save_failed_response(endpoint, payload, response) do
MySQL.insert('failures', endpoint, payload, response)
end
end
Here’s a diagram of the business logic and how it’s spread out over different modules:
In order to piece together that diagram above, we have to start in Receiver
, go to Parser
, go back to Receiver
, go to Sender
and finally finish off in DB
.
Along the way, we encounter some fundamental logic about how webhooks get processed which is hidden in modules that seemingly have nothing to do with that logic. As an example, the logic about not processing a webhook if it expired is contained in the Parser
module whose sole responsibility should be decoding the webhook from JSON. Similarly, the logic about storing different types of webhook endpoint responses into the database is contained in the Sender
module whose sole responsibility should be sending the webhook to its endpoint.
Overall, this webhook processing logic wasn’t too hard to figure out for the simplified code above. But for my team’s actual application – which had a lot of additional complexity – it was quite daunting.
The code above also does not take advantage of the fact that webhook processing can be thought of as a pipeline which can be expressed beautifully in functional programming languages, where the code takes in some initial state and keeps modifying it through different stages to produce an output. Elixir implements a simple version of these pipelines via its pipe operator (|>
).
The New Code
To centralize our webhook processing logic into one module and to express it as a pipeline which modifies some state, I moved all the bits of logic spread throughout the code into a new module called Processor
.
At the top of the Processor
module, the state used by the pipeline is defined in WebhookState
:
defmodule Processor do
@expired_threshold_seconds 5
defmodule WebhookState do
defstruct(
endpoint_url: "",
payload: "",
created_at: "",
response: "",
result: ""
)
end
Within Processor
, the process
function contains our business logic pipeline that modifies the WebhookState
:
def process(webhook = %WebhookState{}) do
# here is the pipeline containing all the business logic :) !
# if the current stage returns an :ok, we move onto the next stage
# if it returns an :error, the pipeline is stopped and we move
# onto the error handler in the else block at the bottom
with(
{:ok, webhook} <- check_webhook_expired(webhook),
{:ok, webhook} <- send_webhook(webhook),
{:ok, webhook} <- handle_response(webhook),
{:ok} <- save_response_to_db(webhook)
) do
Logger.info("webhook processed successfully :) !")
:ok
else
{:error, msg} ->
msg = "error processing webhook :( ! - #{msg}"
Logger.error(msg)
:error
end
end
Lastly, the different stages of the pipeline are defined:
defp check_webhook_expired(webhook = %WebhookState{created_at: created_at}) do
# this was moved from Parser to here :) !
time_since_created = created_at - Time.now
is_expired = time_since_created > @expired_threshold_seconds * 1000
case is_expired do
true ->
# if the webhook is expired, return an :error to stop the pipeline
{:error, "webhook expired"}
false -> {:ok, webhook}
end
end
defp send_webhook(webhook = %WebhookState{
endpoint_url: endpoint_url,
payload: payload
}) do
{status_code, response_body} = Sender.send(endpoint_url, payload)
# store the response returned from the endpoint in WebhookState so
# that the next stage of the pipeline (handle_response) can access it
webhook = %WebhookState{webhook | response: {status_code, response_body}
{:ok, webhook}
end
defp handle_response(webhook = %WebhookState{response: {200, _response_body}}) do
# this was moved from Sender to here :) !
Logger.info("webhook sent successfully")
# store the result in WebhookState so that the next stage of the
# pipeline (save_response_to_db) can access it
webhook = %WebhookState{webhook | result: "success"}
{:ok, webhook}
end
defp handle_response(webhook = %WebhookState{response: {404, _response_body}}) do
# this was moved from Sender to here :) !
Logger.error("failed to send webhook, endpoint returned 404")
# store the result in WebhookState so that the next stage of the
# pipeline (save_response_to_db) can access it
webhook = %WebhookState{webhook | result: "failure"}
{:ok, webhook}
end
defp save_response_to_db(webhook = %WebhookState{
result: "success",
endpoint: endpoint,
payload: payload,
response: {_status_code, response_body}
}) do
# this was moved from Sender to here :) !
DB.save_successful_response(endpoint, payload, response_body)
{:ok}
end
defp save_response_to_db(webhook = %WebhookState{
result: "failure",
endpoint: endpoint,
payload: payload,
response: {_status_code, response_body}
}) do
# this was moved from Sender to here :) !
DB.save_failed_response(endpoint, payload, response_body)
{:ok}
end
end
Here’s a diagram of the refactored logic which is all contained in the Processor
module which calls out to the other modules:
With the new code, all of our webhook processing logic is now contained within Processor
and instead of jumping around modules to see it we only have to look in one module.
There is also a simple pipeline defined in process
where it first checks whether the webhook expired. If the webhook expired, an error is returned and the pipeline is stopped so the webhook does not get sent out. If it didn’t expire, then the webhook is sent to its endpoint in the next stage. Then the different responses returned by the endpoint are handled and saved to the database.
The state needed for the pipeline was defined at the top of the module in WebhookState
and the pipeline modifies this state as it moves through its different stages. For example, in the send_webhook
stage after sending the webhook to its endpoint, the response
is stored in the state so that the next stage, handle_response
, can access it.
Lastly, the scattered logic about not sending expired webhooks and handling different webhook endpoint responses was moved from the Parser
and Sender
modules into the Processor
module. Now the Parser
and Sender
modules only contain logic related to their actual names.
Now before we finish off by cleaning up the old modules, I have something to say about Elixir pipelines.
An Aside About Elixir Pipelines
If you’re familiar with Elixir you might be wondering why I didn’t construct a simple pipeline using the aforementioned pipe operator (|>
), something like:
webhook
|> check_webhook_expired
|> send_webhook
|> handle_response
|> save_response_to_db
Those pipelines are great when you have a simple happy path that, given some initial state, modifies it through different stages and produces an output. But they’re not well-suited for pipelines like webhook processing where any stage could throw an error and bring the pipeline to a halt.
For example, when check_webhook_expired
finds that the webhook has expired, all processing should stop and it should return an error. If you used the simple pipeline code above, you wouldn’t be able to stop the pipeline so all of the subsequent stages would have to handle this error case and skip over their own processing. This is quite messy.
Elixir’s with
statement solves this problem cleanly. It uses pattern matching on the result returned from each pipeline stage such that if an error gets returned from any pipeline stage, all processing stops and the code moves outside the pipeline to the error handler defined in the else
block.
def process(webhook = %WebhookState{}) do
with(
{:ok, webhook} <- check_webhook_expired(webhook),
{:ok, webhook} <- send_webhook(webhook),
{:ok, webhook} <- handle_response(webhook),
{:ok} <- save_response_to_db(webhook)
) do
Logger.info("webhook processed successfully :) !")
:ok
else
{:error, msg} ->
msg = "error processing webhook :( ! - #{msg}"
Logger.error(msg)
:error
end
end
If you want to learn more, check out:
- Elixir docs about the
with
statement - Opus – library that lets you define more complex pipelines with your own “pluggable business logic components”, we actually use Opus at PagerDuty for some of our services
Back to the New Code
Now the only thing left to do is remove all the business logic from the other modules.
Receiver
still receives the webhook and parses it but instead of sending it directly to its endpoint it now passes the webhook to the pipeline defined in Processor.process
:
defmodule Receiver do
def receive(raw_webhook) do
parsed_webhook = Parser.parse(raw_webhook)
case parsed_webhook do
{:ok, endpoint, payload, created_at} ->
# create the initial webhook pipeline state and start the pipeline
%WebhookState{
endpoint: endpoint,
payload: payload,
created_at: created_at
}
|> Processor.process()
{:error, error_msg} -> Logger.error("error - #{error_msg}")
end
end
end
Parser
is now solely responsible for decoding the webhook from a JSON string:
defmodule Parser do
def parse(raw_webhook) do
decoded_webhook = JSON.decode(raw_webhook)
endpoint = decoded_webhook.event.endpoint
payload = decoded_webhook.event.payload
created_at = decoded_webhook.event.created_at
# no more expired webhook logic :) !
{:ok, endpoint, payload, created_at}
end
end
Sender
is now solely responsible for sending the webhook to its endpoint:
defmodule Sender do
def send(endpoint, payload) do
HTTP.post(endpoint, payload)
end
# no more logic about handling different types of responses
# and storing them to the db :) !
end
DB
didn’t have any business logic in the first place so it remained unchanged:
defmodule DB do
def save_successful_response(endpoint, payload, response) do
MySQL.insert('successes', endpoint, payload, response)
end
def save_failed_response(endpoint, payload, response) do
MySQL.insert('failures', endpoint, payload, response)
end
end
If you want to see a diff of the Old Code and New Code on Github, check out this link. Here’s what the diff of the Parser
module looks like:
Once again, I wanted to lightly remind you that all of the code above is a simplification of our actual webhook processing pipeline. If I see any support requests (or even better, tweets) saying something like “Does PagerDuty seriously not even bother to retry sending failed webhooks???”, I will simply shake my head.
Did This Refactor Actually Help???
Yes, it did! But don’t take my word for it, take my team’s:
This brought a tear to my eye :’)!
Everyone now had a better understanding of this business logic which will enable us to make more informed, confident decisions about any changes we want to make to this business logic as part of our rewrite.
One story I’d like to share was during one of our team meetings where we were reviewing the code from my refactor PR. I was going over our equivalent of the Processor
module which had a similar piece of logic around not sending out expired webhooks. Someone asked, “Wait, why are we doing that?”. And then silence … No one could give a good answer. This prompted an investigation into why indeed we were actually doing that. After a long journey into the deep, dark depths of Confluence (arguably scarier than Mordor), we uncovered an old doc that clearly explained why the code did that and it made perfect sense!
Recap
So this is how I refactored the business logic behind our webhook processing pipeline to be more functional.
Recall some of the problems with having your application logic spread out across different modules:
- It takes a long time to move through all these different modules and understand what is actually going on
- You end up with poor separation of concerns in your modules
- It makes it hard to get a big picture view of your business logic
Recall some of the benefits of centralizing all this logic into one module:
- You only have to look at one module to see all the business logic
- You get better separation of concerns –
Parser
is only responsible for decoding webhooks,Sender
is only responsible for sending webhooks to their endpoints - It takes advantage of functional programming concepts like pipelines
- I didn’t mention this but it also makes it easier to unit test this logic
I would be remiss not to mention that there are some benefits to embedding your business logic within different modules of your app. This can lead to less code and faster development and is one of the fundamental ideas behind frameworks like Ruby On Rails that prefer “convention over configuration”. But even within the Rails community, there is an ongoing, passionate debate about whether you should centralize your business logic or keep it spread out over your application. Hopefully you will read this blog post and at least consider the idea of centralizing your business logic.
Big shout-out to Dave Thomas and his “Elixir for Programmers” course which inspired me to spend a whole day (and weekend) refactoring the business logic of my team’s code.
So have I inspired you to drop everything and get refactoring? Or do you vehemently disagree with keeping your business logic separate from everything else? All feedback and discussion is welcome in our PagerDuty Community forum!
If you’d be interested in working for a company that empowers its engineers to do things like refactoring code to make it more developer-friendly, check out our current job openings.