Recently, my manager Tom and I collaborated to create a handy Slack bot to quickly solve a problem we'd noticed. I’d like to share the tool we created, and from it a template for an approach to low effort serverless Slack bots. Because of the simplicity of porting a Python tool to run in AWS Lambda, it was easy to collaborate on with two people working independently, will require no maintenance, and provided a well packaged solution for the business that I thought was worth sharing!
One of Production Engineering’s functions at OVO is to support teams running postmortems to learn from incidents and outages. Our service management team have been working hard on automating and creating tooling around incident management, but when looking to reflect on these incidents a key element that was consistently a pain point that delayed productive discussion was generating a timeline of events.
We built a tool that can be called from a public Slack channel that returns all messages marked with a reaction emoji to the user that runs it, with the intention of easily exporting the timeline of events from a channel during an incident. Here’s the code and how to deploy it: https://github.com/ovotech/pm-timeline-generator - all you need is an AWS account to host it.
Solving the problem
We’d heard about Monzo’s open source tooling, and while our own incident management stack doesn’t line up with theirs (we already integrate with JIRA quite closely), its feature to pull a timeline out of Slack messages inspired us to create our own lightweight version.
Whilst I span up the Monzo Response stack to see if we could either use it, fork it, or leverage parts of it, Tom wrote a Python CLI tool to get what we wanted done: you can see that here. This quick solution was immediately useful for a couple of post-mortems, saving time for the people participating. Writing in Python with the slackclient library let Tom easily interact with the Slack APIs and iterate quickly to have a working tool in a couple of hours.
Over the course of the week, it was clear that the CLI tool was useful and Tom got a few messages to ask if he could run the tool for people across the wider business - postmortem culture isn’t limited to just our software teams, and running a Python script with a Slack OAuth token wasn’t the most straightforward user experience. I’ve spent some time previously writing tools that trigger from Slack, and using a slash command seemed like a great way to let everyone engage with it easily.
How the Timeline Tool works
The Slack interaction with the tool for the user is simple - a Slack app is configured to be triggered by
/timeline :emoji:. In the backend, Slack POSTs that string to an AWS API gateway, which then triggers a lambda to run the logic to return output as a direct message to the user with an attached file of all marked messages. The normal user experience looks like this:
The AWS side has some interesting decisions required to manage the Slack app integration; mainly that Slack expects all its API requests to be responded to within 3 seconds or it will start retrying. Our Timeline app needs longer than that in cases where it’s being used in channels with a lot of data, however. In the usual short-lived incident channels it’s fine, but as in the example above it’s nice that it just works when scraping the #music channel’s history too!
Here’s what the system design looks like
- Slack sends an HTTP POST to the API Gateway.
- The API Gateway (in LAMBDA_PROXY mode) sends the entire POST to the Entrypoint lambda as an
- The Entrypoint lambda:
- Checks this a trusted message from Slack (via the Signing Secret key)
- Checks this a valid input for the Timeline Tool and returns an appropriate failure message if it’s not, along with the HTTP 200 OK that Slack needs to know it’s POST was received.
- If both of these checks pass, forwards the event body to the Main lambda.
- The Main lambda then runs our custom logic, and can use the user_id in the
eventto message the requesting user.
I’m not going to dig into Tom’s CLI tool - this pattern will work to adapt any Python script with minimal work using the two Lambda pattern. You can compare the CLI release tag with the current master branch and see how simple it was to port the code to the Main lambda.
I think this is an interesting example of how we can move fast without leaving technical debt behind, and it’s a pattern I’ve leveraged a few times now for ChatOps-style interactions. Prior to creating this timeline tool, I used the same pattern to allow people to quickly create IAM users for a hackathon from Slack by passing the input string to a function leveraging the AWS
boto3 library to create a login and DM them a password, again in a couple of hours at the keyboard.
Between ephemeral responses, in-channel responses, DM’s and running logic in Python you can allow a lot of tasks to be run easily via slash command! Please feel free to try out the timeline tool and raise pull requests for improvement or extension, or use it as a place to start from hacking out new ideas!