Following our efforts last year to simulate major incidents and build great incident response skills in our autonomous teams, the service management coaches have been looking for ways to do it more efficiently.
One drawback to building and running a simulation as a role play is the large amount of preparation to be done up front, coupled with the limited-use nature of each scenario: if a team has seen it once, it's no longer effective.
On recommendation from Pager Duty's incident training, we decided to trial the excellent Keep Talking and Nobody Explodes as an alternative.
In this post I'll tell you how we ran the exercise, what we learned, and how it compared to the role play scenario.
Playing the game
The rules of the game are simple...
- You have 5 minutes to defuse the bomb.
- Information on how defuse the bomb is contained within a bomb defusal manual.
- The person defusing the bomb cannot look at the manual. The person reading the manual cannot look at the bomb.
- The bomb is comprised of several modules, each of which must be solved to defuse the bomb.
- If you make a mistake, you get a strike and the timer ticks down faster.
- If you get three strikes or the timer runs out, the bomb explodes.
To make things harder we revised the rules a little and split the players up into two teams, communicating via Slack...
- Only the Commander & Communicator can see or type in Slack.
- Only the Defuser can see the screen.
- Only the Experts can see the manual. All Experts must play a part.
- The Commander leads.
- The Bomb Defuser signals the start of each round.
- During a round, everyone must stay in situ.
We played several rounds and then reconvened to reflect on what we noticed and learned.
So what did we learn and how does this relate to building a great incident response?
The more you do something together, the slicker it gets!
In the first round the bomb exploded after five minutes, with two of three modules solved. Two of the module types were present every time they played. In the final round, the players solved those same two modules within the first minute.
They quickly began to focus on one module at a time. Once everyone knew what to expect and developed a common understanding of cadence and process, they solved problems faster.
They were able to progress to harder bombs in later rounds, and also learned to solve new modules more quickly.
Empathise with others for better communication.
Making decisions, taking technical ownership and communicating are all challenging and come with their own pressures.
Understanding each other's information needs enabled clear, concise exchange of relevant information.
The Expert Team learned quickly to wait for the Defusal Team to give them information about the bomb and modules, responding with targeted questions.
Develop a shared vocabulary.
By design, information contained within the bomb modules is not easily described with regular vocabulary. Take a look at this module and imagine describing each symbol to someone over Slack.
Within a couple of rounds the players had already started to build a common lexicon,"Ok, now press the wobbly cactus."
Make use of "dead time".
At first there was a tendency to wait and watch the Slack channel, waiting for a reply from the other side. Communication was very linear and transactional, from the Defuser, to the Commander, to the Communicator, to the experts and back.
After a few rounds, the Defusal Team was looking ahead to the next module, gathering and sharing information while the Expert Team deciphered the manual.
If you're not typing or fixing, start thinking and anticipating what you may need to do next.
Separate roles and responsibilities.
A tried and tested principle for great incident response, we chose to define roles as part of the rules so that people could easily understand how to play.
We did leave the Experts to figure out how best to work together. Interestingly, they naturally began to align themselves in a way that best suited the modules: for simple modules, they devoted a single expert; for complex modules, they worked together to solve the problem and validate their understanding before communicating with one voice.
The players appreciated the benefits of allowing Experts to focus on technical detail and dedicating resource to effective communication.
How does this differ from the simulation role play?
Forcing the use of Slack was a more realistic representation of how we tend to run incident response at OVO, with our engineering teams based in several locations. Communicating well using text comes with specific challenges and requires different sensibilities: that said, learnings here can be carried over into running major incident war rooms with voice or video.
This format enabled us to play a round, reset and repeat quickly. The short rounds and role rotation means players can make mistakes and learn faster.
The timer on the bomb gives tangible time pressure, which is harder to recreate with a simulation.
The focus here was on teaching great intra-team communication and the exercise did not extend to customer- or stakeholder-facing communication.
There is limited decision-making required when playing Keep Talking. Although the content and configuration of the bomb change as you play, the rules remain the same and there aren't many surprises once you've played a few rounds.
In this exercise we defined and preallocated roles. In the role plays we left the players to self organise. The latter is better suited to specifically teach the roles we use in our actual incident process.
Tips for doing this yourself
- Don't skimp on setting the scene. Make sure everyone understands the rules and how the game works.
- Give everyone visual context for how the bomb works by showing them a quick glimpse right at the start.
- Defusers should complete the quick, in-game tutorial so they're familiar with the controls.
- Have two facilitators: one to support each team.
- Have people stick to the same role for 2 - 3 rounds before rotating, giving everyone adequate time to learn before moving on.
- Make sure you do rotate so that people get to spend time in each other's shoes.
- Move up to the next bomb (and therefore increase the difficulty), after the players have solved the current one a couple of times.
- Four Experts is a good limit. If you need more people involved, have them act as silent observers and rotate them in during subsequent rounds. They can bring to the retro what they noticed when they weren't playing.
- Although playing the game lends itself to having the teams in separate locations, it's still best to be colocated for the retro portion.
- You need one license per each instance of the game you intend to run at once.