The Production Engineering team has officially been up and running at OVO for around six months, with our first Production Engineer joining in late January. We've been really pleased on how Prod Eng has gone from idea to reality. Today we are making a big difference in how our development teams write great software.
Production Engineering is not yet in the vocabulary of a typical Software Engineer, despite its usage at Facebook and a handful of other big names. Google describes their SRE function as an implementation of DevOps in their organisation, we feel the same way about Production Engineering at OVO. It's our interpretation of DevOps, and that interpretation is structured to improve the inherent vulnerabilities we have in our organisation design.
We have two pillars of value delivery in Production Engineering, embedding and consulting. Embedding, or engagement, is where we look to help out a team for a number of months, focusing on a specific area of improvement. For example, one we've completed was focused on building custom metrics into an application and migrating into DataDog. This took around two months, and we were then able to focus on another area, and in fact, we continued working with the same team. When Production Engineers embed themselves in a development team, we expect the host team to treat them as a faithful member of the team. That means being involved with all the ceremonies, socials and meetings that the host team is involved with.
We make sure that each engagement is focussed on skills building in the host team while improving tooling and operations. The fact that the host team understands the looming end date for a given engagement is essential. It reinforces the fact that the host team needs to build the skills and the knowledge to support the system when their Production Engineer has moved on. This is important for the growth of the Production Engineers too, they want to be able to help the next team and innovate, they don't want to have to support a CI pipeline they wrote last year. That said with the personal relationship formed by embedding, the host team have a direct line to be able to escalate any significant issues to Production Engineering and get the support they need.
The other pillar, consulting, is centred around providing development teams with on-demand support. This support could be debugging a live incident or a more structured architecture review. Consulting takes an order of hours whereas embedding is on the order of months.
One of the keys to making this new team successful was demonstrating value very early on. We were super pleased when our first engineer Dan joined. He was tasked with revamping a self-hosted CI/CD pipeline on the team he was embedded into. Using his previous experience he was quickly able to transform their pipelines into a set of CircleCI pipelines. This allowed them to focus less on managing their CI/CD infrastructure and enable them to create further pipelines and iterate faster. He achieved this in his first few weeks and proved how valuable this new team was going to be.
We've now completed a handful of engagements across a range of products and technologies. We've worked with GCP, Kubernetes, AWS Lambdas & Step Functions, DataDog and CircleCI to name a few we really value working with. We are starting to see a lot more consistency in our tech stacks, with many teams using CircleCI, Terraform, GKE (Kubernetes) as the base of their stack. We've seen a lot of reuse between these technologies that we weren't seeing before. The Production Engineering team has also become a community, with development teams coming to Prod Eng when they have questions or issues with infrastructure, monitoring, logging or any DevOps related tooling queries. This is excellent because it allows us to connect teams who have solved similar issues or are experienced with certain tools.
On to the next six months, where do we go from here? We need more Production Engineers, that's what the developments teams are saying to us! An excellent problem to have. Along with building the team, we have a keen focus on defining SLOs (service level objectives) across all products at OVO. We see SLOs, much as Google does, as the foundation for running software services. Measuring how successful the service is across a range of metrics is core to making rapid product and engineering decisions. In tandem, we are continuing to build out the foundations of a software organisation that can rapidly make changes in production. This means slick CI/CD pipelines and everything (infrastructure, IAM, pipelines, metrics, ...) as code. With these foundations, we can focus on optimising our deployment and testing speed and explore with more cutting-edge techniques around continuous experimentation, i.e. A/B testing, canarying and automated rollbacks. Production Engineers are vital to shaping these technology choices, they are changing our culture through exploring, implementing these new tools and through socialising these ideas: embedding and consulting.
An exciting future for Production Engineering here at OVO, if you're interested, we have several open positions on team.