OVO Tech Blog
OVO Tech Blog

Our journey navigating the technosphere

Share


Tags


OVO Tech Blog

Kafka Consumer Testing with Burrow - Part 1

In this post we will cover how to build a common docker image that we will then use to run both Kafka and Zookeeper in a local docker-compose cluster.

This is the first part of a series in which I will present a pattern for integration testing of Kafka consumers using Burrow and docker-compose. In this post we will cover how to build a common docker image that we will then use to run both Kafka and Zookeeper in a local docker-compose cluster. In the rest of the series we will continue by creating a simple smoke test to confirm that the consumer is successfully committing its offsets back to the Kafka cluster, giving us assurance that processing progress will not be lost.

The project associated with this series is available on Github here. In this post we will cover the contents of the first commit.

This is a cross-post from my personal blog, which you can follow here.

Motivations

Unit tests are invaluable tools for verifying that the internals of an application are working as expected, but when it comes to our application's interactions with other components over a network we need to take a different approach.

Tools like docker-compose allow us to easily set up a production-like environment that enables us to test at the system level, verifying the behaviour of the application over the network boundary where bugs often occur. Being able to do this on a local machine (as well as in a CI pipeline) shortens the development feedback loop, meaning that we catch bugs earlier and thereby yielding a rise in developer productivity.

Common Dockerfile

We define a multistage build to create our common image. In the build phase we download and verify the contents of Kafka from the Apache archive before extracting and installing it. In the following phase we build the common image that will be the base from which we build our Zookeeper and Kafka images, by copying over the verified install of Kafka from the builder image and installing the dependencies needed to run it in the docker-compose environment:

FROM ubuntu:latest as builder

RUN apt-get update && apt-get -y dist-upgrade
RUN apt-get -y --no-install-recommends install \
    curl \
    ca-certificates

WORKDIR /tmp

COPY SHA512SUMS .

RUN curl -fsSL -o kafka_2.13-2.5.1.tgz https://archive.apache.org/dist/kafka/2.5.1/kafka_2.13-2.5.1.tgz

RUN sha512sum --check SHA512SUMS

RUN tar -C /opt -zxf kafka_2.13-2.5.1.tgz


FROM ubuntu:latest

RUN apt-get update && apt-get -y dist-upgrade
RUN apt-get -y --no-install-recommends install \
    openjdk-11-jre-headless \
    wait-for-it \
    ncat && \
    apt-get clean all

COPY --from=builder /opt/kafka_2.13-2.5.1 /opt/kafka

WORKDIR /opt/kafka

CMD trap : TERM INT; sleep infinity & wait

Zookeeper

Zookeeper provides a centralised service to manage synchronisation and configuration for a Kafka cluster. It is responsible for keeping track of the status of broker nodes, ACLs, and topic configuration to name a few. For a more in depth discussion of Zookeeper's role in Kafka clusters I defer to this article.

Zookeeper Dockerfile

Our Zookeeper Dockerfile is now very simple, all it needs to do is build from the common image and override the CMD to run the script that starts Zookeeper that comes bundled with the Kafka download that we installed, along with the provided default configuration settings defined in config/zookeeper.properties.

FROM common

CMD ["bin/zookeeper-server-start.sh", "config/zookeeper.properties"]

Kafka Dockerfile

Our Kafka Dockerfile is almost as simple as Zookeeper's in that we similarly build from the common image and start the Kafka server using the bundled script to do so, however we also copy over a config directory containing server.properties, since we need to make a small change to the default configuration to tell the server where Zookeeper is running.

FROM common

COPY config/ ./config/

CMD ["bin/kafka-server-start.sh", "./config/server.properties"]

The version of Kafka that we installed in the common image contains a copy of server.properties populated with default values, so we can make a local copy of this that we can edit by building our common image, running a container from it, and copying the file out with docker cp:

# from within the directory with the common Dockerfile
$ docker build -t common .
$ docker run --rm -it --name common common bash

# from another shell outside of the common container
$ mkdir -p config
$ docker cp common:/opt/kafka/config/server.properties ./config/server.properties

The only change we need to make in this file is to update the value of zookeeper.connect to zookeeper:2181, since we will set the hostname of the Zookeeper container to zookeeper in docker-compose.

Configuring docker-compose

We define common, kafka, and zookeeper services in our docker-compose.yml file. Although we do not need a running instance of the common container for our tests, it is still specified here since we want docker-compose build to build the image, since it is the shared base image of the other two services.

Kafka depends on Zookeeper for orchestration of broker nodes in the distributed system. Even though we have only one broker server in our example the dependency still exists, so we make it explicit to docker-compose using depends_on in conjunction with service_healthy and healthcheck. Note that service_healthy is not available in version 3.x of docker-compose, so make sure you are using 2.x. It is also worth noting that it is possible to configure docker compose to run multiple broker servers by adjusting the scale parameter, however we will not do so in this example since there is additional per-broker configuration required in server.properties, which goes beyond the scope of this series.

The health checks use the wait-for-it package that we installed in the common Dockerfile, and will check whether a port is open on the localhost. In practice this means that when the zookeeper service is running, docker-compose should consider it to be in a healthy state as long as port 2181 is open, after a startup grace-period of 10 seconds. Once zookeeper passes its first health check, docker-compose will then start the kafka service.

version: "2.4"

services:

  common:
    image: common
    build:
      context: common/

  kafka:
    hostname: kafka
    build: broker/
    healthcheck:
      test: ["CMD", "wait-for-it", "--timeout=2", "--host=localhost", "--port=9092"]
      timeout: 2s
      retries: 12
      interval: 5s
      start_period: 10s
    depends_on:
      zookeeper:
        condition: service_healthy

  zookeeper:
    hostname: zookeeper
    build: zookeeper/
    healthcheck:
      test: ["CMD", "wait-for-it", "--timeout=2", "--host=localhost", "--port=2181"]
      timeout: 2s
      retries: 12
      interval: 5s
      start_period: 10s

Running docker-compose

We are now ready to run our local Kafka set up by running docker-compose up --build! You should see from the logs that both services start up without issue. When you are done, run docker-compose down to clean up the containers.

If we had not overridden the default configuration for the Kafka server to specify where Zookeeper is running we would have observed connectivity issues when the services started. Go ahead a change the value of zookeeper.connect to some other value and rebuild the images. You should see that when you run docker-compose up again after the rebuild Kafka now complains in the logs that it cannot connect to Zookeeper before failing and exiting with the error code 1.

Still to Come

In the next part of this series we will introduce Burrow and use it to run a very simple test. We will configure another service in docker-compose whose responsibility it will be to create a topic, produce a known quantity of messages to the topic, and consume a known quantity of messages from the topic. Burrow will be used to verify that production and consumption both occurred as expected.

Later in the series we will look at how to create a simple consumer using Scala and fs2-kafka, and how to test these with the docker-compose pattern. It is worth noting however that the choice of language and framework really are not important, so long as your producers and consumers run inside docker containers you can use the pattern presented here.

View Comments