The Customer Interaction Platform team here at OVO wanted to automated some manual post-release checks as part of their CI release stage. This post details some of the challenges of automating that process.
These checks were generally executed manually by the engineer following the post-release run-book, its essentially a high level integration test, to ensure that the telephony, data protection act verification and CRM systems are working together as expected. These three systems are tied together using a Chrome extension to provide a seamless user experience for our call centre agents.
As all three of the systems are browser based, thoughts immediately went to using browser automation tools such as Cypress, as we needed multiple tabs and extensions we needed an entire browser to automate, including multiple tabs and extensions to complete the checks fully.
The corporate browser here at OVO is Google's Chrome as we are heavy users of G-Suite and Meet internally.
The obvious choice from here is Selenium, which as it operates outside the browser rather than within it, has the ability to control things that cannot securely be accessed from within the browser such as controlling multiple tabs.
Headed Chrome in Docker
Running a headless Chrome in Docker is pretty much a solved problem: Chrome has first class support for running headless and Selenium ship pre-made chrome based Docker images. Indeed we initially used these images until we discovered that headless Chrome does not support extensions.
The flow we were trying to automate relied on an in-house Chrome extension to function, so we were out of luck.
To run a full blown headed chrome we would need a display server. Running a full X server inside docker seemed overly complex though, so it was Xvfb, a virtual framebuffer, to the rescue.
We also pass no-sandbox
and --disable-gpu
to stop various warnings due to resources (cgroups and grapics hardware) being unavailable in our Docker environment.
#!/bin/bash
_tidy() {
kill -TERM $chrome
wait $chromium
kill -TERM $xvfb
}
# Kill Xvfb when we exit
trap _tidy SIGTERM
XVFB_WHD=${XVFB_WHD:-1920x1080x16}
# Start Xvfb
echo "starting Xvfb"
Xvfb :99 -ac -screen 0 $XVFB_WHD -nolisten tcp -nolisten unix &
xvfb=$!
export DISPLAY=:99
/opt/google/chrome/chrome --no-sandbox --disable-setuid-sandbox --disable-gpu $@ &
chromium=$!
wait $chrome
wait $xvfb
We put this script in the location chromedriver
expects the chrome binary /usr/bin/google-chrome
and it ensures that Chrome can start, and that Xvfb is killed if Chrome exits.
Loading Extensions
Python Selenium can provide the Chromedriver a list of packed extensions, which will be loaded into the temporary profile used by Chrome. This didn't work in our testing with unpacked extensions, and the internal extension are unpacked.
To solve this we downloaded the internal extension to a known path at Docker build time, and loaded it in as an unpacked extension with the load-extension=/path/to/extension
argument to Chrome.
Authentication
Now we were ready to start automating the processes, the first task was to authenticate to the systems, all of which used different mechanisms.
Selenium and Salesforce 2FA
Our Salesforce instance enforces 2FA via email. Usually this is a pretty infrequent prompt, but as the tests are starting with an empty browser profile our cookies get lost each time. This meant we would have to provide a numerical pin that is emailed to the account address at login time.
Enquires were made if we could disable the the 2nd factor ease of integration on this account, but reports from the team suggested it was a globally acting option only.
So a work around was need, it seems simple enough: connect to the inbox via IMAP, wait for an email from Salesforce, load the email then grab the code via regex and pass it back into Selenium.
def get_salesforce_confirmation_code(
user, password, cutoff_date, imap="imap.gmail.com"
):
mail = imaplib.IMAP4_SSL(imap)
mail.login(user, password)
mail.select("inbox")
# search and return uids
result, data = mail.uid("search", '(SUBJECT "Verify your identity in Salesforce")')
latest_email_uid = data[0].split()[-1]
result, data = mail.uid("fetch", latest_email_uid, "(RFC822)")
raw_email = data[0][1]
msg = email.message_from_bytes(raw_email)
# Check this is a recent email
msg_date = email.utils.parsedate_to_datetime(msg["date"])
if msg_date < cutoff_date:
raise NoSuitableMailError
if msg.is_multipart():
for part in msg.walk():
content_type = part.get_content_type()
content_disposition = str(part.get("Content-Disposition"))
# skip any text/plain attachments
if content_type == "text/plain" and "attachment" not in content_disposition:
body = part.get_payload(decode=True)
break
# not multipart - i.e. plain text, no attachments....
else:
body = msg.get_payload(decode=True)
match = re.search(ID_CODE_REGEX, body.decode("UTF-8"))
return match.group(1)
Selenium and Basic Authentication
Another unexpected challenge was the login to the telephony system was protected by "Basic Auth". Due to recent security enhancements in chrome the tried and tested user:pass@domain.com
syntax is no longer allowed as of Chrome v59.

Surprisingly there is also no way to access the pop-up authentication box you receive as a user from selenium, or to inject a custom header (so we can send the Authorization
header ourselves).
This means in the end we ended up using another chrome extension to fill in the authentication details, again, loaded via a an argument to chrome at start time. This 2nd small extension is dynamically generated and adds an authentication listener: e.g chrome.webRequest.onAuthRequired.addListener()
. This listener then returns the appropriate login information, as supplied by environment variables injected from the Docker environment.
Dealing with multiple tabs
As the tests were written as a pytest
pack, we made each tab available to the tests via fixture. We store a list of the windowId
s and then switch to the relevant tab before the test is run, an example snippet is below:
@pytest.fixture
def crm_tab_driver(driver):
driver.switch_to.window(TAB_HANDLES[crm])
return driver
This places inherent limits on the number of tests you can run at once, as you can only have a single active tab.
Debugging
When developing using Selenium locally you get the relative luxury of being able to actually see what state of the browser and its contents, a luxury which is very much not available when your docker image is running in Kubernetes somewhere!
Selenium has good support for taking screenshots so we had the script upload to Google cloud storage at various stages.
To assist with debugging we also upload to the same bucket a copy of the DOM state at exit in case of an error, a copy of the formatted dev-console logs and finally the chromedriver log itself
for entry in driver.get_log('browser'):
browser_logs += '{timestamp} {source} {level} {message}'.format(**entry)
A final quick note on chromedriver logs, if the parent directory of the path you set for service_log_path
does not exist when chromedriver starts, it will exit with no error message! Hopefully that titbit will save you some minutes of confusion.
if not os.path.exists(LOG_DIR):
os.makedirs(LOG_DIR)
driver = webdriver.Chrome(
options=options,
service_log_path=os.path.join(LOG_DIR, "chromedriver.log"),
service_args=["--verbose", "--whitelisted-ips"],
)
As you can see what seems a conceptually simple task has many layers and problems to solve as you work through them. Hopefully these notes will help give an idea of the kinds of items that need navigating when coming up with working solutions with real world software.