OVO Tech Blog

Writing a Helm Plugin

Introduction

Chris Every

Chris Every


Production Engineering devops kubernetes helm

Writing a Helm Plugin

Posted by Chris Every on .
Featured

Production Engineering devops kubernetes helm

Writing a Helm Plugin

Posted by Chris Every on .

If you’re using Helm as your Kubernetes package manager, it’s likely the command-line tool already fulfils your use-cases, as it ships with a lot of functionality. One thing that’s missing, though, is snapshot/restore. Existing commands such as install, delete and rollback make it easy to manipulate existing packages in a Kubernetes cluster, but what happens if the packages, or better yet, the cluster, are accidentally deleted? We want to be able to completely recover a cluster, having a snapshot of Helm's state is an important feature to achieve this. Keeping track of which packages and versions are installed at any given point can be tricky. When you mix in values that can be set at install time, cluster recovery suddenly becomes complex.

Extending Helm

We wanted to continue using Helm but we needed to extend Helm to achieve our aim of automated recovery.Fortunately, there are several options.

Perhaps the most crude option is to write a wrapper script around Helm. If your extension to Helm is extremely basic this could be a quick win, but anything more than that and scripting could get pretty messy. Furthermore, if you’ve got a reasonable number of teams that may use the extension, in my experience you’ll end up with a myriad of branches of essentially the same script.

You could fork the Helm GitHub repo, modify Helm core and build from source. This is better; at least now you’d have a build that you could distribute to other teams if needed, or they could fork from you and then build from source themselves. Grokking the bits of Helm core that you need to interact with could be a little challenging, depending on the functionality you’re adding. The risk profile, however, increases substantially with this option. You’re modifying a tool that’s heavily used in deploying to production. Additionally, you’ll either drift further from upstream/master and not have any security patches or bug fixes, or you’ll introduce the burden of rebasing and dealing with any conflicts that arise.

Helm’s Ecosystem

Another option of extending Helm is to make the changes to the aforementioned fork, and submit a PR to the Helm Git repo. In just over 3 years since its public launch, Helm has grown from an internal Hackathon project at Deis (creators of various container-orchestration tools, acquired by Microsoft in April 2017) to being a popular tool in Kubernetes app management. Results of an official Kubernetes Application Survey released in April 2018 showed 64% of the application developers, application operators, and ecosystem tool developers who answered the survey reporting to using Helm. The Helm project has a large ecosystem of contributors, and is currently at an incubating maturity level with CNCF (Cloud Native Computing Foundation). As a CNCF hosted project:

“Helm is part of a neutral foundation aligned with its technical interests, as well as the larger Linux Foundation, which provide the project with governance, marketing support and community outreach.”

The CNCF maintains graduation criteria for the various maturity levels it defines. Based on this level of maturity, it seems understandable that they take their contribution/approval process very seriously. It may be some time before your branch is merged into master. During this time, you’ll want to keep your fork/build up-to-date with security and bug fixes in master, bringing potentially considerable toil.

Helm’s Plugin Model

Fortunately for us, Helm has a concept of plugins. Akin to Git’s plugin model, it’s possible to register a plugin with your local Helm tool, which Helm invokes when you ask it to. It’s important to stress that there are no changes required within the tool’s source code in order to get this to work. Helm will blindly invoke the plugin, which is completely separated from the Helm tool, and can be written in any language. For example, it could be a Golang binary that internally uses some of the same packages as Helm to talk to Tiller. It could be a bash script that simply calls the Helm command-line tool and processes some output. It doesn’t even have to interface with Helm at all (although it almost certainly will).

Benefits of Helm’s Plugin Model

Before we dive into the detail of a Helm plugin that we’ve written at OVO, let’s take a look at what Helm’s plugin model allows us to do. We’re able to get off the ground quickly with the new functionality we’re adding, following our own contribution/approval processes. Because the Helm tool ships with a helm plugin install command (that can be given a Git repo url), we can provide a single Git repo for any potential users, who can install very easily with a single line command. This repo can be private, the only requirement is that the user trying to install the plugin has read access to it. We don’t need to rebuild the Helm tool ourselves; users can continue receiving security and bug fixes via official releases.

Helm’s plugin model allows us to dogfood the plugin that we developed. If we then open-sourced it, we could potentially build up a decent ecosystem of users and contributors. If the plugin is written in the same language as Helm (Golang), we could request for it to be merged into Helm core. If we were to do so, an existing diverse ecosystem already behind our plugin could prove valuable.

Disaster Recovery and Mean Time To Recovery

“Availability is a combination of how often things are unavailable and how long they remain that way” -- Baron Schwartz.

Mean time to recovery (MTTR) is a term used to describe the second part of that equation, referring to the amount of time your application is unavailable for. It’s likely that your applications’ acceptable levels of availability are defined in their SLOs, so their MTTR is something you’ll want to be aware of.

If you have any periods of unavailability in an application, then you’ll be able to calculate the MTTR. If you’re lucky enough to have 100% availability then you simply won’t know what the MTTR is. In both scenarios, practicing Disaster Recovery (DR) can help you to improve or to calculate an MTTR respectively.

The Helm-Bulk Plugin

At OVO, we use Helm across multiple teams. We want to reduce the amount of time and toil involved in fully recovering Kubernetes clusters, in order to reduce MTTR directly and make the practice of DR more attractive to engineers. Whilst making DR more appealing to engineers doesn’t necessarily lead to it being practiced more often, it certainly helps. We can implement a Helm plugin to automate recovery of not only our internal applications in clusters, but third party applications too.

We developed a Go program to handle three operations. “Save”, which stores the current list of Helm releases and their values to a tar.gz file. “Load”, which installs releases defined in file into the cluster. “Show”, which prints out releases and values stored in the tar.gz.

Worthy of mention here is the “useTunnel” functionality Helm provides. If set to True (in the plugin.yaml, described below) then Helm will open up a tunnel to your chosen Kubernetes cluster, and set a “TILLER_HOST” environment variable containing the local address of the tunnel. Plugins can then use this local address to talk to Tiller. We were unaware of this in the early stages of development, so ended up implementing the same tunnel-opening functionality in our plugin. This ultimately led to requiring lots of Kubernetes dependencies, a story for another blog. Another side-effect of depending on Helm for the tunneling is you’ll need to point Helm (see plugin.yaml, below) to your local builds for your local dev testing.

Plugin Installation

Once we’d finished the Go program, it was time to switch our attention to the install process. As previously mentioned, the helm plugin install command can handle installation of a Helm plugin from a Git repository in a single command, which is super-useful. A plugin.yaml file must exist in the root of the specified Git repo. This file provides Helm with, amongst other things, the name of the plugin (and hence how you can invoke it with helm <plugin_name>) and the local file path of the binary or script that Helm is to execute.

At this point, Helm still hasn’t downloaded the binary/script, and has no idea how to do so. This information isn’t given directly to Helm; rather, the path to a pre-install script can be specified in a hooks.install field in plugin.yaml. Within this script you can specify logic for downloading the plugin binary/script. When dealing with plugins in the form of a Go binary, this download often happens from an Asset of a Github release. Alternatively, if the binary/script exists in the plugin’s Git repo (generally not considered best practice), then you wouldn’t need any hooks as it’d already be present in the HELM_PLUGIN_DIR.

Check out our open-sourced plugin, helm-bulk.

Here’s the plugin.yaml:

Name: "bulk"
version: "0.0.23"
usage: "Load or Save Helm Releases"
description: |-
 Load or Save Helm Releases from File to Cluster, or Cluster to File, respectively
command: "$HELM_PLUGIN_DIR/bin/helm-bulk"
hooks:
 install: "cd $HELM_PLUGIN_DIR; ./scripts/install.sh"
 update: "cd $HELM_PLUGIN_DIR; ./scripts/install.sh"

Also check out our install script:

# cd to the plugin dir
cd $HELM_PLUGIN_DIR

# get the version
version="$(cat plugin.yaml | grep "version" | cut -d '"' -f 2)"

# find the OS and ARCH
unameOut="$(uname -s)"

case "${unameOut}" in
    Linux*)     os=Linux;;
    Darwin*)    os=Darwin;;
    CYGWIN*)    os=Cygwin;;
    MINGW*)     os=windows;;
    *)          os="UNKNOWN:${unameOut}"
esac

arch=`uname -m`

# set the url of the tar.gz
url="https://github.com/ovotech/helm-bulk/releases/download/v${version}/helm-bulk_${version}_${os}_${arch}.tar.gz"

# set the filename
filename=`echo ${url} | sed -e "s/^.*\///g"`

# download the archive using curl or wget
if [ -n $(command -v curl) ]
then
    curl -sSL -O $url
elif [ -n $(command -v wget) ]
then
    wget -q $url
else
    echo "Need curl or wget"
    exit -1
fi

# extract the plugin binary into the bin dir
rm -rf bin && mkdir bin && tar xzvf $filename -C bin > /dev/null && rm -f $filename

Wrapping up, we can now install the plugin with the single command:

$ helm plugin install https://github.com/ovotech/helm-bulk

..and execute via Helm, for example:

$ helm bulk save -s=<csr_server_name>

Note, is the string you used for the CA cert and the CSR subjects when setting up TLS on Tiller. It’s strongly recommended to use TLS, but If you haven’t set up Tiller to use it, you can use the -t, --disable-tls flag to disable when using Helm Bulk.

Provided a connection can be established to Tiller (i.e. you’ve authenticated into your cluster) this will result in a helm-releases.tar.gz file in your current working directory.

Now take a look at which Helm releases have been passivated:

$ helm bulk show

And take a look at which releases the plugin would re-install by using the -r, --dry-run flag:

$ helm bulk load -s=<csr_server_name> -r

By default, when loading from file to cluster, helm bulk will ignore any releases that already exist in the cluster. So if you run the helm bulk load command above, and all releases in the file are already present in the cluster, you’ll get a No Releases found to install message back. Helm-Bulk is therefore idempotent; you can issue the load command as many times you want, the end result will be all of the Helm releases you originally passivated being presently installed in your cluster.

Scheduled Backups

What use would this functionality be if we didn’t schedule it to run automatically? How frequently this should happen really depends on how many deployments you’re doing to your Kubernetes clusters. At OVO, in the projects that now make use of the plugin, several deployments can be performed each day, so we schedule helm bulk save to be run every 3 hours during office hours. Backups are pushed to GCS buckets, and are versioned, so we can restore the application state of a cluster from any backup.

A final but important note on the tar.gz files that come out of running helm bulk save: they’re not encrypted. The plugin doesn’t handle encryption of the release information for you, and seeing as you can (and will at times have to) set sensitive data in a Helm release, that data will also be present in the backups (albeit base64 encoded). Encrypting the backup files could represent a future extension of the plugin, but for now, you’ll have to ensure you don’t store the backup files in any insecure places (or better still, don’t use secrets in Helm releases, though that may not be feasible).

Cluster Cloning

After taking a backup of a cluster using helm bulk save, there’s nothing tying that backup file to the specific cluster you created it from. It’s possible, therefore, to restore it into a new cluster, allowing you to clone clusters very easily. Currently this won’t work if any packages are required to be singletons, e.g. an ingress-controller that uses a static IP address. To get round this you'd have to delete the original cluster prior to restoring from backup. A future enhancement to the Helm Bulk plugin could be to define an exclusion list for packages with a more involved restore process.

Conclusion

At OVO we wrote the Helm Bulk plugin as a means of reducing human latency and ultimately time to recovery.We spotted a gap in the functionality of Helmand added commands via a plugin to save and load releases from and to clusters respectively. We took advantage of Helm’s plugin model, to start using our plugin quickly without having to wait for our code changes to be accepted into Helm core.

The plugin model adopted by Helm and Git, and I’m sure by lots of other open source projects, benefits not only us as users by moving quickly and safely with cool ideas, but also project maintainers who can continue to apply their rigorous acceptance process and receive PRs of a good standard.

This blog should hopefully demystify the process of getting started with a Helm plugin. Once you understand the relationship Helm, the config (e.g. plugin.yaml) and your binary/script have, it should be plain sailing..

Chris Every

Chris Every

View Comments...