This is the first part of a series in which I will present a pattern for integration testing of Kafka consumers using Burrow and docker-compose. In this post we will cover how to build a common docker image that we will then use to run both Kafka and Zookeeper in a local docker-compose cluster. In the rest of the series we will continue by creating a simple smoke test to confirm that the consumer is successfully committing its offsets back to the Kafka cluster, giving us assurance that processing progress will not be lost.
The project associated with this series is available on Github here. In this post we will cover the contents of the first commit.
This is a cross-post from my personal blog, which you can follow here.
Unit tests are invaluable tools for verifying that the internals of an application are working as expected, but when it comes to our application's interactions with other components over a network we need to take a different approach.
Tools like docker-compose allow us to easily set up a production-like environment that enables us to test at the system level, verifying the behaviour of the application over the network boundary where bugs often occur. Being able to do this on a local machine (as well as in a CI pipeline) shortens the development feedback loop, meaning that we catch bugs earlier and thereby yielding a rise in developer productivity.
We define a multistage build to create our common image. In the build phase we download and verify the contents of Kafka from the Apache archive before extracting and installing it. In the following phase we build the
common image that will be the base from which we build our Zookeeper and Kafka images, by copying over the verified install of Kafka from the
builder image and installing the dependencies needed to run it in the docker-compose environment:
openjdk-11-jre-headless- java runtime environment, needed to run Kafka and Zookeeper.
wait-for-it- used to configure docker-compose health checks.
ncat- used to open network ports on our containers to signal to docker-compose that they are healthy.
FROM ubuntu:latest as builder RUN apt-get update && apt-get -y dist-upgrade RUN apt-get -y --no-install-recommends install \ curl \ ca-certificates WORKDIR /tmp COPY SHA512SUMS . RUN curl -fsSL -o kafka_2.13-2.5.1.tgz https://archive.apache.org/dist/kafka/2.5.1/kafka_2.13-2.5.1.tgz RUN sha512sum --check SHA512SUMS RUN tar -C /opt -zxf kafka_2.13-2.5.1.tgz FROM ubuntu:latest RUN apt-get update && apt-get -y dist-upgrade RUN apt-get -y --no-install-recommends install \ openjdk-11-jre-headless \ wait-for-it \ ncat && \ apt-get clean all COPY --from=builder /opt/kafka_2.13-2.5.1 /opt/kafka WORKDIR /opt/kafka CMD trap : TERM INT; sleep infinity & wait
Zookeeper provides a centralised service to manage synchronisation and configuration for a Kafka cluster. It is responsible for keeping track of the status of broker nodes, ACLs, and topic configuration to name a few. For a more in depth discussion of Zookeeper's role in Kafka clusters I defer to this article.
Our Zookeeper Dockerfile is now very simple, all it needs to do is build from the
common image and override the
CMD to run the script that starts Zookeeper that comes bundled with the Kafka download that we installed, along with the provided default configuration settings defined in
FROM common CMD ["bin/zookeeper-server-start.sh", "config/zookeeper.properties"]
Our Kafka Dockerfile is almost as simple as Zookeeper's in that we similarly build from the common image and start the Kafka server using the bundled script to do so, however we also copy over a
config directory containing
server.properties, since we need to make a small change to the default configuration to tell the server where Zookeeper is running.
FROM common COPY config/ ./config/ CMD ["bin/kafka-server-start.sh", "./config/server.properties"]
The version of Kafka that we installed in the
common image contains a copy of
server.properties populated with default values, so we can make a local copy of this that we can edit by building our
common image, running a container from it, and copying the file out with
# from within the directory with the common Dockerfile $ docker build -t common . $ docker run --rm -it --name common common bash # from another shell outside of the common container $ mkdir -p config $ docker cp common:/opt/kafka/config/server.properties ./config/server.properties
The only change we need to make in this file is to update the value of
zookeeper:2181, since we will set the hostname of the Zookeeper container to
zookeeper in docker-compose.
zookeeper services in our
docker-compose.yml file. Although we do not need a running instance of the
common container for our tests, it is still specified here since we want
docker-compose build to build the image, since it is the shared base image of the other two services.
Kafka depends on Zookeeper for orchestration of broker nodes in the distributed system. Even though we have only one broker server in our example the dependency still exists, so we make it explicit to docker-compose using
depends_on in conjunction with
healthcheck. Note that
service_healthy is not available in version
3.x of docker-compose, so make sure you are using
2.x. It is also worth noting that it is possible to configure docker compose to run multiple broker servers by adjusting the
scale parameter, however we will not do so in this example since there is additional per-broker configuration required in
server.properties, which goes beyond the scope of this series.
The health checks use the
wait-for-it package that we installed in the
common Dockerfile, and will check whether a port is open on the localhost. In practice this means that when the
zookeeper service is running, docker-compose should consider it to be in a healthy state as long as port
2181 is open, after a startup grace-period of 10 seconds. Once
zookeeper passes its first health check, docker-compose will then start the
version: "2.4" services: common: image: common build: context: common/ kafka: hostname: kafka build: broker/ healthcheck: test: ["CMD", "wait-for-it", "--timeout=2", "--host=localhost", "--port=9092"] timeout: 2s retries: 12 interval: 5s start_period: 10s depends_on: zookeeper: condition: service_healthy zookeeper: hostname: zookeeper build: zookeeper/ healthcheck: test: ["CMD", "wait-for-it", "--timeout=2", "--host=localhost", "--port=2181"] timeout: 2s retries: 12 interval: 5s start_period: 10s
We are now ready to run our local Kafka set up by running
docker-compose up --build! You should see from the logs that both services start up without issue. When you are done, run
docker-compose down to clean up the containers.
If we had not overridden the default configuration for the Kafka server to specify where Zookeeper is running we would have observed connectivity issues when the services started. Go ahead a change the value of
zookeeper.connect to some other value and rebuild the images. You should see that when you run
docker-compose up again after the rebuild Kafka now complains in the logs that it cannot connect to Zookeeper before failing and exiting with the error code
Still to Come
In the next part of this series we will introduce Burrow and use it to run a very simple test. We will configure another service in docker-compose whose responsibility it will be to create a topic, produce a known quantity of messages to the topic, and consume a known quantity of messages from the topic. Burrow will be used to verify that production and consumption both occurred as expected.
Later in the series we will look at how to create a simple consumer using Scala and fs2-kafka, and how to test these with the docker-compose pattern. It is worth noting however that the choice of language and framework really are not important, so long as your producers and consumers run inside docker containers you can use the pattern presented here.