It is technology we have already used internally for 2 years, but now the time has come to share it with the community and make it open-source.
Read more to find out how it can help you, how you can contribute and what is the story behind the release!
Our motivation was to supply developers with tooling that allows them to use secrets efficiently. A Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key. As you might guess, such data can’t be hardcoded in your application code. Secrets should be stored and accessed, securely. Vault from HashiCorp is our go-to tool for secrets storage, as it is cloud provider agnostic. This way, secrets are in one place with defined policies and are available only to trusted clients.
Our story starts around the year 2018. At that time we used to access the Vault directly from each of the applications, but this has caused a lot of problems:
- The Vault server was abused heavily, handling many unnecessary requests to retrieve secrets which led to instability, e.g. the same secret was retrieved in every request. Even though that secret was not changed so often, it would be enough to load the secret, only on the startup of the application.
- Every Vault outage was affecting applications directly. If the Vault was not running, the app couldn’t be restarted or released, because it was loading secrets during the startup.
This state was not up to the company standards and we were searching for a way to optimise the handling of secrets. So in 2018, we have introduced our library, Thief library, which is a secret manager for Vault with caching. It solved the problem of the hard-dependency on Vault and Vault outages didn’t affect applications as much. However it had its drawbacks. As this was a library it required a change in application code for every application. This created a cognitive load for developers and also for the Platform team, as we had to support the library for different programming languages. Ultimately, this was very inconvenient.
After we have migrated all applications to Kubernetes by the end of 2019. We then started to explore other solutions that were available on the market at that time. The only available solution, provided directly by HashiCorp was a Sidecar which had cons as the previously mentioned ones. There was no cache, so if the Vault was down, then every app would suffer from this outage.
Shortly after, we decided to use Kubernetes secrets as our cache storage. The first version of the solution we had used in 2019 was very simple. At Kiwi.com, we use Terraform heavily to manage our infrastructure. We implemented a solution which used Terraform to synchronize secrets from Vault into k8s secrets, and we mounted k8s secrets to deployment environment variables. This solution was closer to our vision for how the system should work. But, it wasn’t flexible enough because for every secret we needed to run `terraform apply` and it was tricky to expand the solution for the more complicated use-cases.
So this is how the idea of a Kubernetes operator, which will create k8s secrets from Vault secrets, was born.
How does it work?
The purpose is pretty basic: the Vault operator based on the CRD manifest synchronizes Vault Secrets into k8s secrets and adds extra metadata.
If you want to know more, we recommend going directly to the source. Great docs are available at the HashiCorp website. Check out Kubernetes Auth Method. The configuration part needs to be done manually, it’s not something that the operator handles, but it’s crucial for the operator to work.
- Operator inside k8s cluster obtains JWT token:
- Using automount if available in `/run/secrets/kubernetes.io/serviceaccount/token`
- Otherwise, a new token is created for the specified SA
- This JWT token with a role is sent to the Vault server to obtain a short-lived token
- Using Vault short-lived token we can make a request to the Vault server from an operator such as reading secrets.
There is an option to use a long-lived Vault token, but we recommend it just for local development purposes.
How does manifest look
VaultSecret manifest is a CRD that the Vault operator watches.
A manifest defines how secrets should be synced:
- from which Vault paths do we support multiple secrets and merge them into a single secret
- how often should reconcile occur
- how often should reconcile occur
- how to authorize Vault
- usually it’s enough to just define it inside the ConfigMap of the operator
- path: secret/recursive/path/*
- path: secret/my/sub/path/my-secret
# Those are defaults - you don't need to specify them!
targetSecretName: test # same as VaultSecret name
role: my-namespace # same as the namespace
Reconcile is a phase when the operator tries to sync secrets from Vault to k8s. Each VaultSecret manifest has a reconciliation loop.
- Operator retrieves the actual Vault Secret manifest and validates it.
- Log in to Vault
- Read all paths and merge them into a single secret
- Get a k8s secret, if it exists
- Create or update k8s secret if data are different. Set VaultSecret as the owner of k8s secret
Example of how secrets are merged from multiple paths:
Our Vault operator was stable and battle-tested in our clusters. After HashiCorp released a blog post that they plan to move towards developing the operator, the idea to open source our solution and make it available for the community was raised.
We contacted HashiCorp and decided to set up a meeting to talk about what we already have, and what they would need. The goal was to align our view on purpose and need. Our operator is more basic than the HashiCorp vision for the tooland tailored to our use-cases.
We never needed a super complex solution (e.g. several auth mechanisms) for our use-cases, so we did not aspire to make our technology that way before open sourcing it. But as we saw the demand and the struggle of professionals around the world with the same issue, we decided it makes sense to release at the very least, the solution in our current setup to the community.
At the point of the release, our codebase was stable but old. Before releasing it to the public, we made some enhancements:
- Dropped off the old version of Operator SDK and instead use kubebuilder directly
- Cleaned up the code base from the old features that weren’t useful anymore
- Fixed some old bugs
- Tests are using a test environment cluster, instead of a real one. We can run tests easily in our CI/CD pipeline
- Updated documentation, made it more general, removed Kiwi.com specifics
New adventure starts
So, there you have it – the story behind the Vault Operator for k8s by Kiwi.com.
Feeling inspired? Contribute to the project: https://github.com/kiwicom/k8s-vault-operator
Have any questions, ideas or notes? Let us know by commenting in the repository!
And of course, if you want to help us develop amazing things at Kiwi.com, go and check jobs.kiwi.com and let us know!
We would like to thank all the engineers that have been working on the Vault operator for a job well done, and of course also HashiCorp for being very approachable and discussing all important aspects with us.
Written by Dávid Mikuš and Yurii Kyrychynskyi @ Kiwi.com