What is Policy As [versioned] Code?

Table of Contents

This is a continuation of the PodSecurityPolicy is Dead, Long live…? article, which looks at how to construct the most effective policy for your Kubernetes infrastructure. Haven’t read that? Check it out first.

Based on that foundation, this article looks at how versioning policies streamline the developer experience to deliver features and minimise downtime whilst meeting compliance requirements.

“Policy as code” is one of the more recent ‘as-code’ buzzwords to enter the discourse since ‘infrastructure-as-code’ paved the way for the *-as-code term. The fundamental principles of it sound great: everything in version control, auditable, repeatable, etc. However, in practice, it can often fall apart when it comes to the day 2 operational challenges which are exacerbated by adopting ‘GitOps’.

We’ll look at a common scenario and present a working example of versioned policy running through the entire process to address the issue.

The status-quo

Let’s start with a likely (simplified) scenario:

  1. Person (a) writes a change to a deployment yaml file locally, yaml appears valid, so
  2. Person (a) pushes it to a branch and raises a pull request to the main/master branch requesting a review
  3. Person (b) looks at the diff, agrees with the change and approves it
  4. Person (a/b) merges the change causing the change to now be in the main/master branch
  5. CI/CD picks up the change and successfully applies the changed deployment yaml to the Kubernetes cluster
  6. The deployment controller creates a replicaSet and submits it to the Kubernetes API (which is accepted by the api server)
  7. The replicaSet controller creates pods and submits to the API, the API server rejects these pods since they are rejected by a PodSecurityPolicy Rule (or similar) admission controller.
    1. Unless you’re polling the API server for events on your deployment rollout, you won’t know it’s failed.
    2. Your Main/Master branch is now broken; you’ll either need to figure out how to rollback changes or roll forward a fix, either by administering the cluster directly, or repeating the entire process from step 1.

That’s just your ‘business as usual’ flow for all your devs.

What happens when you want to update the policy itself?

Your policy engine might allow you to ‘dry run’ before you ‘enforce’ a new policy rule by putting it in a ‘warning’ or ‘audit’ or ‘background’ mode where a warning response is returned in the event log when something breaks the new rule.

But that will only happen if the API server re-evaluates the resources, which usually only occurs when the pod reschedules. Again, someone needs to be monitoring the event logs and acting on them, which can introduce its own challenges in exposing those logs to your teams.

All of that activity is happening a long way from the developers that are going to do something about it.

Furthermore, communicating that policy update between the well-intentioned security team and developers is fraught with common bureaucratic concerns frequently found in organizations at scale. The security policy itself might be considered somehow sensitive as it may reveal potential weaknesses.

Consequently, reproducing that policy configuration in a local development environment may also prove impracticable. This is all made much, much worse with multiple clusters for development, staging, production and multi-tenancies with multiple teams and applications co-existing in the same cluster space all with their own varying needs.

So what can you do about all of this?

First and foremost, sharing the policy is imperative. Your organisation has to absolutely accept the advantages of exposing policy and communicating that effectively with its developers far outweigh any potential security advantage gained through obscurity.

Along with sharing it, you need to articulate the benefits of each and every one. After all, you’ve hopefully hired some smart people, and smart people will try to find workarounds when they don’t see value in the obstruction.

gatefail

Explaining the policy should hopefully help you justify it to yourselves too. Rules naturally should become based less on emotional and anecdotal and instead grounded in informed threat modeling that’s easier to maintain as your threat landscape changes.

The next step is collecting the policy, codifying and assuring it is kept in version control. Once it’s in version control, you can adopt the same semantic versioning strategy seen elsewhere that your developers will be used to.

Quick recap semantic versioning

  • Version numbers look like 1.20.30
  • The number of digits between the points is up to you (1.2.3 is fine as is 1.002.000003)
  • Don’t be fooled by the decimal points, they’re not real (1.20.0 is greater than 1.3.0)
  • The first digit is a major breaking change where you make wholly incompatible changes (this will probably be the case with almost all your policy changes)
  • The second digit is a minor change that might be adding functionality but is backwards compatible (this is less likely for policy changes)
  • The third is for patches where you make backwards compatible bug fixes (likely quite rare for your policy changes)
  • For more detail see the Semantic Versioning website

Great so you’ve got your policy definitions in version control, tagged with semantic versioning, the next step is consuming that within your applications so your developers can test their applications against it, locally to start with, then later in continuous integration.

Hopefully your developers will be used to this at least, they can treat your policy like they treat versioned dependencies.

Now they’re testing locally, implementing the same check-in CI should be straightforward, this will assure that peer reviews are only ever carried out on code that is known to pass your policy.

Given it’s now a dependency, you can use tools like snyk/dependabot/renovate etc to automate making pull requests to keep it updated and highlight to your developers when the policy update is not compatible with their app.

Awesome. Now for the really tricky bit…

Your runtime needs to support multiple policy versions 😱

From a risk perspective, your organisation needs to be comfortable with accepting the transitionary period for old policy versions to be retired, which comes down to communication between those settings and those consuming/subjected to the policy, forwards and backward by one version, so your runtime needs to support at least three significant versions.

Show me the code

I’ve put together a reference model of this in a dedicated GitHub organisation with a bunch of repositories. Renovate was used to make automated pull requests on policy updates, you can see examples of that.

Other tools:

  • kyverno as the policy engine, but any policy engine that allows you to be selective with labels on the resources should work.
  • Github Actions as the CI/CD but anything similar would work that integrates version control with pull/merge requests should work.
  • Github for version control, but any similar git service with a pull request capability and linked tests should work.
  • KiND for the Kubernetes cluster but any Kubernetes cluster should work, this just let me do all the testing quickly.
  • Renovate automatically maintains the policy dependencies by raising pull requests for us.

Please, allow me to introduce you to Example Policy Org

Enter Example Policy org

app1

This app is compliant with version 1.0.0 of the company policy only the pull request to update the policy to 2.0.1 currently can’t merge

app2

This app is compliant with version 2.0.1 of the company polic

app3

This app is compliant with version 2.0.1 of the company policy but it’s only using 1.0.0 and can be updated with a pull-request

policy

1.0.0

Only one simple policy here, it requires that every resource has a label of mycompany.com/department so long as its set it doesn’t matter.

2.0.0

Following on from 1.0.0 we found that the lack of consistency isn’t helping, some people are setting it to ‘hr’ others ‘human resources’. So a breaking policy change has been introduced (hence the major version bump) to require the value to be from a known pre-determined list. So the mycompany.com/department label must be one of tech|accounts|servicedesk|hr

2.0.1

The policy team forgot a department! So now the mycompany.com/department label must be one of tech|accounts|servicedesk|hr|sales. This was a non-breaking, very inor change, so we’re going to consider it a patch update, so we only increment the last segment of the version number.

e2e

This is an example of everything coexisting on a single cluster for simplicity and keeping this free to run I stand up the cluster each time using KiND, but this could just as well be a real cluster(s).

policy-checker

This is a simple tool to help our developers test their apps, they can simply run docker run --rm -ti -v $(pwd):/apps ghcr.io/example-policy-org/policy-checker when in the app and it’ll test if the app passes.

The location of the policy is intentionally hard coded, making this reusable outside of our example organisation would take some significant thoght and is out of scope for this

Caveats

What I haven’t done is require the mycompany.com/policy-version label, that’s probably part of the policy-checker and ci process’s job and also up to your cluster administrators to what they do with things that don’t have the label, you might for example exclude anything from kube-system, otherwise require the mycompany.com/policy-version >= 1.0.0 and update that minimum version as required. In reality it’s just another rule, but seperate from the rest of the policy codebase.

Now, it’s your turn

You should be able to reuse the principles of what we’ve covered in this article to go forth and version your organization’s policy and make the dev experience a well-informed compliant breeze.

As you can see, this is far from the finished article. To share your thoughts, and if you think there’s a better answer or more to it, tweet us!

slack banner 02

About Appvia

Appvia enables businesses to solve complex cloud challenges with products and services that make Kubernetes secure, cost-effective and scalable.

Our founders have worked with Kubernetes in highly regulated, highly secure environments since 2016, contributing heavily to innovative projects such as Kops and fully utilizing Kubernetes ahead of the curve. We’ve mastered Kubernetes, and experienced its complexities, so our customers don’t have to. 

Share this article
Twitter
LinkedIn
Facebook
profile-112x112-crop-1 (4)
Chris Nesbitt-Smith
SOLUTION ARCHITECT
A developer at heart that’s now more focused on people and less on the technical implementation detail. Although, when I’m not building stuff with the kids, I spend time on open source work and co-maintain a few high-profile repositories in the home automation space.

The podcast that takes a lighthearted look at the who, what, when, where, why, how and OMGs of cloud computing

Related insights

Managing Kubernetes Secrets with HashiCorp Vault vs. Azure Key Vault Keeping secrets secure...
Namespaces are a vital feature of Kubernetes. They allow you to separate uniquely named...
DevOps teams have rapidly adopted Kubernetes as the standard way to deploy and...
Once you start working with Kubernetes, it’s natural to think about how you...
Self-service of cloud resources Kubernetes has been brilliant at delivering an ecosystem for...
Pods, deployments, and services are just some of the concepts that you need to understand in...
Last week I published a blog, “How to spot gaps in your Public Cloud...
Breaking down the core areas that you should be aware of when considering...
5 tips to help you manage more with less Not every manager of...
Public cloud has provided huge benefits in getting infrastructure and services to people...
This is the story of how three Appvia Engineers contributed so much to...
Overview The UK Home Office is a large government organisation, whose projects and...