Chris Nesbitt-Smith, February 14, 2022
This is a continuation of the PodSecurityPolicy is Dead, Long live…? article, which looks at how to construct the most effective policy for your Kubernetes infrastructure. Haven’t read that? Check it out first.
Based on that foundation, this article looks at how versioning policies streamline the developer experience to deliver features and minimise downtime whilst meeting compliance requirements.
âPolicy as codeâ is one of the more recent ‘as-code’ buzzwords to enter the discourse since âinfrastructure-as-codeâ paved the way for the *-as-code term. The fundamental principles of it sound great: everything in version control, auditable, repeatable, etc. However, in practice, it can often fall apart when it comes to the day 2 operational challenges which are exacerbated by adopting âGitOpsâ.
We’ll look at a common scenario and present a working example of versioned policy running through the entire process to address the issue.
Letâs start with a likely (simplified) scenario:
That’s just your ‘business as usual’ flow for all your devs.
Your policy engine might allow you to âdry runâ before you âenforceâ a new policy rule by putting it in a âwarningâ or âauditâ or âbackgroundâ mode where a warning response is returned in the event log when something breaks the new rule.
But that will only happen if the API server re-evaluates the resources, which usually only occurs when the pod reschedules. Again, someone needs to be monitoring the event logs and acting on them, which can introduce its own challenges in exposing those logs to your teams.
All of that activity is happening a long way from the developers that are going to do something about it.
Furthermore, communicating that policy update between the well-intentioned security team and developers is fraught with common bureaucratic concerns frequently found in organizations at scale. The security policy itself might be considered somehow sensitive as it may reveal potential weaknesses.
Consequently, reproducing that policy configuration in a local development environment may also prove impracticable. This is all made much, much worse with multiple clusters for development, staging, production and multi-tenancies with multiple teams and applications co-existing in the same cluster space all with their own varying needs.
First and foremost, sharing the policy is imperative. Your organisation has to absolutely accept the advantages of exposing policy and communicating that effectively with its developers far outweigh any potential security advantage gained through obscurity.
Along with sharing it, you need to articulate the benefits of each and every one. After all, youâve hopefully hired some smart people, and smart people will try to find workarounds when they donât see value in the obstruction.
Explaining the policy should hopefully help you justify it to yourselves too. Rules naturally should become based less on emotional and anecdotal and instead grounded in informed threat modeling that’s easier to maintain as your threat landscape changes.
The next step is collecting the policy, codifying and assuring it is kept in version control. Once it’s in version control, you can adopt the same semantic versioning strategy seen elsewhere that your developers will be used to.
Great so youâve got your policy definitions in version control, tagged with semantic versioning, the next step is consuming that within your applications so your developers can test their applications against it, locally to start with, then later in continuous integration.
Hopefully your developers will be used to this at least, they can treat your policy like they treat versioned dependencies.
Now theyâre testing locally, implementing the same check-in CI should be straightforward, this will assure that peer reviews are only ever carried out on code that is known to pass your policy.
Given itâs now a dependency, you can use tools like snyk/dependabot/renovate etc to automate making pull requests to keep it updated and highlight to your developers when the policy update is not compatible with their app.
Awesome. Now for the really tricky bit…
From a risk perspective, your organisation needs to be comfortable with accepting the transitionary period for old policy versions to be retired, which comes down to communication between those settings and those consuming/subjected to the policy, forwards and backward by one version, so your runtime needs to support at least three significant versions.
Iâve put together a reference model of this in a dedicated GitHub organisation with a bunch of repositories. Renovate was used to make automated pull requests on policy updates, you can see examples of that.
Other tools:
Please, allow me to introduce you to Example Policy Org
This app is compliant with version 1.0.0 of the company policy only the pull request to update the policy to 2.0.1 currently canât merge
This app is compliant with version 2.0.1 of the company polic
This app is compliant with version 2.0.1 of the company policy but it’s only using 1.0.0 and can be updated with a pull-request
Only one simple policy here, it requires that every resource has a label of mycompany.com/department
so long as its set it doesnât matter.
Following on from 1.0.0 we found that the lack of consistency isnât helping, some people are setting it to âhrâ others âhuman resources’. So a breaking policy change has been introduced (hence the major version bump) to require the value to be from a known pre-determined list. So the mycompany.com/department
label must be one of tech|accounts|servicedesk|hr
The policy team forgot a department! So now the mycompany.com/department
label must be one of tech|accounts|servicedesk|hr|sales
. This was a non-breaking, very inor change, so weâre going to consider it a patch update, so we only increment the last segment of the version number.
This is an example of everything coexisting on a single cluster for simplicity and keeping this free to run I stand up the cluster each time using KiND, but this could just as well be a real cluster(s).
This is a simple tool to help our developers test their apps, they can simply run docker run --rm -ti -v $(pwd):/apps ghcr.io/example-policy-org/policy-checker
when in the app and itâll test if the app passes.
The location of the policy is intentionally hard coded, making this reusable outside of our example organisation would take some significant thoght and is out of scope for this
What I havenât done is require the mycompany.com/policy-version
label, thatâs probably part of the policy-checker and ci processâs job and also up to your cluster administrators to what they do with things that donât have the label, you might for example exclude anything from kube-system
, otherwise require the mycompany.com/policy-version >= 1.0.0
and update that minimum version as required. In reality it’s just another rule, but seperate from the rest of the policy codebase.
You should be able to reuse the principles of what we’ve covered in this article to go forth and version your organizationâs policy and make the dev experience a well-informed compliant breeze.
As you can see, this is far from the finished article. To share your thoughts, and if you think there’s a better answer or more to it, tweet us!