This blog post is about how OVO implemented PSD2 (the Payments Secure Directive), a European directive aimed at making online card payments more secure in order to reduce fraud. At the heart of the directive is Strong Customer Authentication or SCA (explained below) which allows extra checking that someone using a card is its rightful owner.
We had to write new card payment systems from scratch for all of the OVO Group companies that take card payments because the existing solution didn’t support PSD2. We needed to have a well-tested, reliable system in place by the deadline of the 14th of September 2019 to avoid the possibility of a significant proportion of our card payments starting to fail (!) post-deadline.
This post describes how we did it. It’s written by developers, so there is quite a lot of technical content!
What is SCA?
In the beginning was the username and password. If you have a strong password, that’s not a bad way of securing assets, unless your password is somehow compromised. So along came MFA (multi-factor authentication) that requires extra proof of identity (i.e. more than one “factor”). Factors are things you know (e.g. your username and password, a PIN number or the details of a credit card), things you have (e.g. a smartphone or RSA hardware key) and things you are (e.g. your thumbprint or voice).
Tech companies such as Google and Amazon have been using MFA to secure accounts for years, but banks have been slower to adopt the technology. This is now changing.
Banks have fraud risk assessment systems that monitor usage of cards and try to detect fraudulent activity. They need to strike a balance between making the payment process easy (type in card details, hit “Pay”) and preventing fraud, so a “challenge” (e.g. asking for a code that has been sent to the cardholder’s smartphone) is only made when fraud is suspected.
As of the 14th of September 2019, all companies handling card payments were required to have systems capable of supporting challenges.
What were the requirements?
After doing due diligence on several potential providers, we chose to go with Worldpay. This involved fundamental changes to the way we take card payments, chief among them being:
- Rather than taking card details on our websites and sending them securely to a third party payment endpoint, we needed to embed a “hosted payment page” (HPP) hosted on Worldpay’s domain instead.
- The previous process was synchronous (submit payment details and get an immediate yay or nay in return), but the PSD2 process is asynchronous.
The HPP approach outsources maintenance of the payment page to Worldpay, allowing them to iterate on customer experience, security, etc. It also means that the card details are not handled by OVO, which increases our PCI DSS compliance. Code in our websites only has to “initialise” a payment and redirect to a URL on Worldpay’s domain that hosts the payments page. Once the user has entered their card details, potentially been challenged and hopefully hit “Submit”, Worldpay redirects back to OVO. The URL redirected to is different depending on whether the user completed the payment successfully, abandoned the transaction or the transaction was rejected by the bank issuing the card.
The front end flow is shown in the diagram below. Globes are what the user sees in the browser and cogs represent our backend service. The user enters their credit card details into a page hosted by Worldpay.
The PSD2 process is inherently asynchronous because our front end code is not coupled to our financial systems. Once the outcome of a transaction is known, Worldpay hit a “webhook” endpoint linked to our backend systems. This “event” then tells us whether the payment succeeded and hence whether to credit the customer’s balance.
How did we build it?
We decided to have completely separate infrastructure for each of the OVO Group companies that take card payments (known as “merchants” in the world of card payment providers like Worldpay). This means that a failure in a component for one merchant (e.g. a RESTful API service) can’t affect another merchant. It also allows us to onboard new merchants quickly by spinning up new infrastructure and configuring it.
We hosted everything on AWS and wrote all the code in Scala. The system for a given merchant is split into two major components:
- An always-on service hosted in Elastic Beanstalk that accepts “initiate payment” requests, talks to Worldpay and returns a “redirect URL” for the HPP for the transaction. (This is represented by a set of cogs in the front end flow diagram above.)
- An event driven webhook endpoint exposed using API Gateway that triggers a lambda function to process events in the card payment lifecycle (requested, authorised, declined, charged back, etc.).
The card payment events are streamed using Kinesis, persisted to a permanent data store for analytical purposes and published to Kafka so that merchant teams can consume them and credit customer balances.
The event driven flow is shown in the diagram below.
How did we test it?
At OVO, we believe in testing! We do several types of testing:
- Unit testing of individual software components (e.g. classes), running in a single process in memory, with mock dependencies as appropriate
- Integration testing of a given system, e.g. hitting the webhook endpoint and making sure that the correct event appears on Kafka
- End-to-end testing of the entire system
Our end-to-end tests proved particularly valuable during the development process. We call the always-on service to initiate a payment for a test merchant in our UAT environment and then use an automated browser (Cypress) to go to the returned redirect URL, fill in the payment details and hit “Submit”.
We use test card details that mean the payment will succeed without a challenge and then wait for the correct event to appear on Kafka.
What were the challenges (no pun intended)?
We had both organisational and technical challenges during the 10 week implementation of the new systems.
The team building the back end services was distributed between London and Bristol, meaning some (not unwelcome) travel and a lot of remote pairing, which worked really well. The backend team also had to liaise with the merchant front end teams, of which there were 4. We had amazing help from OVO’s Project Management Team, leaving the developers to integrate with Worldpay’s API, build infrastructure, write code, write unit tests, deploy to non-prod, write integration and end-to-end tests and finally deploy into production.
On the technical side, the main challenge was around “tokenised payments” which applied to one of the merchants. This is where a card is saved in Worldpay’s system in return for a token that is then stored by OVO. The token can be used to make repeat payments without the customer having to continually re-enter their details. We use this for customers who consume energy on a pay-as-you-go basis, and so regularly top up their balance with a card.
We had to migrate hundreds of thousands of tokens from the old system to the new one, and we had to make sure we got it right! Imagine seeing lots of transactions on your credit card that you didn’t make because the tokens weren’t mapped correctly. You would be paying for someone else’s energy and would probably be quite miffed.
We did this by registering new tokens in both systems until we were sure the data had been migrated with the correct mapping between customer and token. We also did lots of redundant checks to make sure the mapping was correct.
How is it all going after the switch?
Of course, we rigorously tested our non-prod deployments and also tested “friendly” payments in Production before the 14th of September deadline, but it was still a nervous moment when we started routing significant volumes of card payments through the Production systems. We didn’t see any errors by close of play on the 14th of September, so we decided to keep transactions going through the new system and check in the morning.
We were greeted on the Saturday morning by a steady stream of events flowing through Kafka. We checked logging, monitoring and alerting for both the always-on service and the webhooks for all 4 merchants and everything was healthy.
Since then, the system has operated without any need for intervention from developers or Production Support.
Such a good outcome is down to the people involved in the project. A high standard was always maintained in both technical and non-technical work and everyone was willing to communicate, travel, help others and “go the extra mile” whenever necessary.
On the technical side, we maintained an “everything as code” policy, which meant Infrastructure as Code, full testing and declarative CI / CD pipelines. Avoiding any manual processes helped to iterate rapidly and safely. We disseminated knowledge among the teams through pair programming and code reviews.
On the non-technical side, project managers and product managers supported the teams, listened to the developers, managed stakeholders and liaised with third parties to keep specifications and timelines constantly up-to-date.
All of this meant that we achieved a high risk, high profile deployment in a short timeframe without any incidents. We’re proud of what we achieved and really enjoyed the celebration pizza and drinks evening the week after the go live :-)