Production Engineering within OVO has gone through a number of changes since it’s original inception in 2017. At the time, OVO was a much smaller company serving a smaller customer base. Production Engineering was designed to be our take on the DevOps movement, something that helped drive the OVO culture that we continue to operate by to this day.
The original mission of Production Engineering was one based on augmenting a team's capabilities. Production Engineers have never been responsible for running teams' services. In order for teams to be agile, they were given complete autonomy to build and operate their services in the most appropriate manner. Some teams thrived in this environment, but others either didn’t feel comfortable due to a lack of expertise in operational disciplines, or struggled with a lack of time to create robust implementations themselves. It was within these teams that Production Engineers could embed to give assistance.
The assumptions that held true then do not necessarily hold true now. OVO has grown as a company, with a larger number of developers to support that increased customer base, and so the Production Engineering function has also been forced to adapt to ensure that we can continue to serve that original mission to the best of our abilities.
Early Challenges
When I joined OVO as a Production Engineer in mid 2020 I joined a company that was going through a period of change. Our customer base had increased dramatically following the recent acquisition of SSE Energy Services from SSE plc, and the company was gearing up in order to support migration activities.
Upon joining I was immediately embedded within the team in OVO Retail that develops the Energy Insights products, and supported the backend and data scientists to transform how they worked. However, after a year of being fully embedded in the team and against the backdrop of the changes that were happening throughout the business, I was beginning to feel that certain aspects of the implementation of Production Engineering were not as good as they could have been.
From my perspective a number of issues were beginning to become clear:
- Lack of capacity. By being fully embedded in one team, half a dozen teams were not getting any support from me.
- Duplication of work. Many teams were solving the same problems over and over again. I remember discovering at least 4 teams had at separate times created Terraform modules for running a service on ECS Fargate!
- Divergence of standards. Different engineering teams used different technologies to solve similar problems. Similarly, teams who weren’t able to get any support from Production Engineers were creating suboptimal solutions due to lack of expertise.
Whilst team autonomy is one of the great parts of OVO’s culture, we were beginning to feel pain points due to it. Something needed to change.
Evolution, not Revolution
The original blueprint for Production Engineering was still sound. We just needed to adapt to the changes in the business so that we could continue our mission as effectively as we could. We wanted a model that would enable us to take the best bits of the embedded model that we previously operated under, but could alleviate or resolve the problems that had been identified.
To address lack of capacity, we needed a way to “share” a Production Engineer effectively between teams who needed support. To address duplication of work and divergence of standards, we needed to ensure that Production Engineers collaborated on solutions that would meet the needs of all supported teams.
In short, we needed to form a team of Production Engineers.
Enter the Hybrid Model
I often refer to the model of Production Engineering that we developed as a hybrid model, rather than calling it a centralised team or some other term. I say this because I believe the teams that operate using the model combine the best parts of both the fully centralised and fully embedded models. One of the key things that made the fully embedded model a great fit for OVO was that it enabled teams to retain high levels of autonomy and ownership, and we wanted to make sure that when we made changes to Production Engineering we didn’t take any of that away from teams.
Here is how we operate in the hybrid model.
We spend half of our capacity working directly with the teams we support in what we call engagements. Teams raise requests for us which we triage and work through. This might be helping to debug a production incident, or maybe giving guidance and best practice for creating CI pipelines, or they might have a requirement to build a new service and need support developing appropriate infrastructure.
The other half of our capacity is spend working on our own roadmap items. Our position as an enabling team means we get to see all the common patterns and problems in product teams and that, coupled with our expertise, helps us to identify solutions that the teams themselves may miss.
We purposefully try to limit the scope of our engagements. In the past, long-term engagements between product teams and Production Engineers built dependencies that can result in bottlenecks to delivery. Rather than doing something ourselves, we work with a mentoring mindset to help skill up the product teams and enable them to self-serve in the future.
At the same time, we strive to get the most value out of any engagement that we undertake by trying to make something that can be consumed by others as well. If a team has a requirement for infrastructure we will make a suitable reusable component out of it that other teams can also make use of. If a team needs some advice on tagging their metrics, we will write and publicise a piece of guidance around observability best practices.
The Future of Production Engineering?
Like many decisions in technology, if asked “Is this the best way to do Production Engineering?” the answer is “It depends.” It works for us in OVO due to our culture - high levels of autonomy and ownership coupled with the agility demanded to achieve Plan Zero.
However, we also acknowledge that there are trade offs to be made in any organisational model. The hybrid model that we have adopted means that we are now forced to compete with a myriad of other priorities on the backlogs of the teams we support, which can slow down the engagement process.
I am happy with how Production Engineering has evolved so far to support the changing needs of the business, but I am also very aware that the only constant in life is change. In a world where Platform Engineering as a discipline is on the rise and the business continues its focus on providing guard rails for product teams, it may just be a matter of time before Production Engineering is forced to evolve again.
If you would like to be a part of Production Engineering, helping to enable and support teams with their Infrastructure, Pipeline, and Observability challenges, take a look at our careers board for current vacancies!