Chris Nesbitt-Smith, February 1, 2022
TL;DR PodSecurityPolicy exists in Kubernetes to provide security controls for pods. PSPs are deprecated in 1.21 (April 2021) and will be removed entirely in 1.25 (expected around April 2022). This article explains what PSPs are and its alternatives. We created a PSP migration tool that translates existing PSPs to 3 different policy engines. This blog will also present why using policies might be making things worse not better and how these checks could be carried out more efficiently with good CI/CD practices.
In Kubernetes, security isn’t done for you. By default workloads will run with administrative access, which might be ok if there is only a single application running in the cluster. This is seldomly required (you probably don’t need kubernetes) and you’ll consequently suffer a ‘noisy neighbour effect’ along with large security blast radiuses.
Many of these concerns were addressed by PodSecurityPolicies which have been present in the api since the very early days of kubernetes version 1.0.
Pod Security Policies enable fine-grained authorization of pod creation and updates.
A Pod Security Policy is a cluster-level resource that controls security sensitive aspects of the pod specification. The PodSecurityPolicy objects define a set of conditions that a pod must run with in order to be accepted into the system, as well as defaults for the related fields.from https://kubernetes.io/docs/concepts/policy/pod-security-policy/
Using PSPs, cluster administrators can impose limits on pod creation for example the types of volume that can be consumed, the linux user that the process runs as (to avoid running things as root) and more.
When a pod is requested to be scheduled on Kubernetes cluster via its API, it goes through a number of components before it is created. One of the components is the Admission Controller. This component intercepts the request before it is persisted in the etcd database and the resource being created. Admission Controllers can be configured to use any web service (internal or external to the cluster).
Though PSP predates the modern dynamic Admission Controller it operates in effect as a (mutating, in some cases) admission controller, meaning that it doesn’t affect pods that are already scheduled, only new ones being created/updated.
PodSecurityPolicy describes certain restrictions on pods that a user/service account can create, but how often do you create pods without a higher level controller like a Deployment DaemonSet, StatefulSet or Jobs doing the heavy lifting? In which case the pods are created by a service account. So already we’re stumbling on some of the usability issues.
One of the issues with PSPs is that it is hard to apply restrictive permissions on a granular level, consequently it inevitably increases security risk. The PSPs get applied when the request to create the pods is submitted to the cluster, there is no way of applying PSPs to pods that are already running in the cluster.
You can learn more about the PSPs issues and motivations for deprecating in the SIG Auth’s KubeCon NA 2019 Maintainer Track session video.
However, as you can see in some cases PSPs can be useful. As PSPs are being deprecated, is there a way to secure our pods in the cluster? There are many alternatives, this article will focus on the following three; others are available, if you maintain another please get in touch.
Kyverno is a policy engine designed for Kubernetes. It can validate, mutate, and generate configurations using admission controls and background scans. Kyverno policies are Kubernetes resources and do not require learning a new language. Kyverno is designed to work with tools you already use like kubectl, kustomize, and Git.
Kubewarden integrates with Kubernetes by providing a set of Custom Resources. These Custom Resources simplify the process of enforcing policies on your cluster.
Gatekeeper is a customizable admission webhook for Kubernetes that enforces policies executed by the Open Policy Agent (OPA), a policy engine for Cloud Native environments hosted by CNCF.
These tools can be installed in any Kubernetes cluster. There is a built-in feature in Kubernetes called PodSecurityStandards which aims to replace some of key PodSecurityPolicies within core kubernetes without a third party add on.
Pod Security Standards most easily thought of as three rigidly predefined Pod Security Policies (Privileged, Baseline and Restricted).
On the upside these are rigid, the same on any cluster that implements them, so easy to test, communicate with suppliers, teams etc. The downside is that it’s unrealistic to assume that your entire workload would run within the confines of a ‘restricted’ policy. Since it is explicitly the case that you can not create any exceptions to the policy without a very large step down to ‘baseline’ or worse ‘privileged’ when all you really wanted to do was grant a single control.
In addition to this, curiously the controls are levied to an entire namespace, furthermore this is controlled by a label on the namespace, rather than a formal update to the namespace spec, or even an annotation which is normally the case for alpha and beta features being introduced to kubernetes.
If you’re a software vendor, then working towards meeting the Restricted standard is a great position and will give you a good headstart for whatever unique configuration your customers have implemented and demonstrate that you have considered the security implications of your product.
If you are already using PSPs and require greater control over implementing policies, the next section of this article will show you how to migrate from PSPs to Kyverno, Kuberwarden or Gatekeper.
There is unfortunately no simple like-for-like mapping, PodSecurityPolicy and any of the replacements behave slightly differently, and for good reason too.
If you’ve been using PodSecurityPolicy for a while, you’ve likely developed some quite complex rules which have become entangled with the usability issues, so I would encourage you to take the opportunity to refactor and simplify over trying to continue what you’ve always done; you’ve got a shiny new tool to use after all.
That said, there has been some work to ease the transition by reproducing the key capabilities and even calling them the same things.
Short answer, use our whizz-bang-super-duper tool! Provide your existing PSP and take your pick of a corresponding policy in Keyverno, Kubewarden or Gatekeeper. Don’t we all wish migration was always this easy!
You can use it in a browser here: https://appvia.github.io/psp-migration/ or download a release and run it in the CLI
# https://github.com/kubernetes/website/blob/main/content/en/examples/policy/example-psp.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: example
spec:
privileged: false
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- '*'
COPY
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: example
spec:
rules:
- validate:
pattern:
spec:
"=(initContainers)":
- "=(securityContext)":
"=(privileged)": false
"=(ephemeralContainers)":
- "=(securityContext)":
"=(privileged)": false
containers:
- "=(securityContext)":
"=(privileged)": false
message: Rejected by psp-privileged-0 rule
match:
resources:
kinds:
- Pod
name: psp-privileged-0
validationFailureAction: enforce
COPY
# https://github.com/kubernetes/website/blob/main/content/en/examples/policy/example-psp.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: example
spec:
privileged: false
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- '*'
COPY
apiVersion: policies.kubewarden.io/v1alpha2
kind: ClusterAdmissionPolicy
metadata:
name: example
spec:
module: registry://ghcr.io/kubewarden/policies/pod-privileged:v0.1.9
rules:
- apiGroups:
- ""
apiVersions:
- v1
resources:
- pods
operations:
- CREATE
- UPDATE
mutating: false
settings: null
COPY
You can use the Gatekeeper Library to give you a headstart and that is the basis for how our tool PSP Migration tool will generate policy
# https://github.com/kubernetes/website/blob/main/content/en/examples/policy/example-psp.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: example
spec:
privileged: false
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- '*'
COPY
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPPrivilegedContainer
metadata:
name: example
spec:
match:
kinds:
- apiGroups:
- ""
kinds:
- Pod
parameters: null
COPY
Feel free to give the tool a go on the online tool, or download and run locally. The tool is open source and if you would like to see something let us know by creating a pull request on the GitHub repo.
The accompanying table shows PSPs fields and their counterpart conversions; it’s how we developed the tool
As a rule of thumb, I’d suggest the default answer should be ‘no’.
Using cluster enforced policy does not automatically mean everything is as secure as we hope. It is hard to know the effectiveness of policy without having to constantly monitor the impact they are having and seeing how many requests are being rejected. In order for cluster enforced policy to be successful, all of the stakeholders need to buy-in and understand why their request gets rejected.
We should always be thinking about shifting the process to the left so issues are caught as early as possible. Using cluster enforced policy (alone) does not help and the request to create the resource has to reach the API server, be authenticated and authorised before it is rejected/accepted. In order to get to that point the request would have gone through pipelines to be deployed and if the request is rejected valuable time and resources will have been wasted.
This is further exacerbated if your cluster enforced policy is based on the pod, since your deployment/daemonset/statefulset/job/crd/etc would be accepted by the API server and policy, but the pods it tries to create would be rejected within the internal loops within the controller or other operators present on the cluster.
Kubernetes is already complex enough, adding policies to the cluster adds another layer of complexity that needs to be managed, maintained and documented to ensure smooth operation of your cluster.
I’m not saying you shouldn’t have policies and security controls; but it’s too easy to get carried away and reinvent all the painful bureaucracy that ‘devops’ promised we wouldn’t have to endure.
If you’re doing GitOps (and you absolutely should be) and the only thing that can make changes to your cluster is the CI pipeline (and it absolutely should be), and if it’s not you must at least monitor and remove drift.
Then the consequence of policy being evaluated and enforced in the cluster is you’ve committed the cardinal sin of devops on accident by shifting all that responsibility right and making it harder to observe.
You’ll likely have broken main/master branches, thrashing rework, lots of pull requests with the titles like “fixing policy fail”, a necessity for access to be granted to the upstream cluster among other things. Basically all the stuff you were promised was left behind in the past has come back to haunt you.
Cut to the chase Chris, what’s the answer?
Cluster policy is fine but failures should cause alerts you’d be glad to be awoken at 3am to deal with. They should represent genuine incidents, not just a developer trying to do their job, exceptions should not need to exist to then be granted.
Infrastructure as Code is the answer for this all. Developers who write code solved this problem a long time ago, linters and static analysis, and they have a passion for running these FAST, and consistently, in their IDE, and in CI before any pull request review occurs which assures that the teams code all looks similar and confirms to some mutually agreed standards the team develops.
You can have as many policies you like as long as these policies are treated as code, committed into source control and versioned accordingly. This will give you a complete view of the policies in use in your cluster at any point in time and help understand the risk landscape, target what workloads should be supported and when you should stop supporting a given policy version.
The key thing is that your developers should be able to evaluate against that policy locally, maybe even within their editor and your CI pipeline and version control can enforce compliance.
Your cluster level policy failures should then be limited to real fires where something has gone drastically wrong and a 3am alert will cause warranted panic and gratitude for the policy’s existence.
If you really feel there is a compelling reason to have lots of cluster enforced policy, then evaluate the same policy as your local developers and CI systems in the cluster, don’t translate to another language, run it as is; anything else will lead you to problems, and test against the lowest common denominator.
If you’re going to turn on warnings rather than just rejections, be sure someone will see them, if your gitops is even vaguely real, that won’t be a human in the first instance, so your teams will need to get that in front of them; otherwise what’s the point?
Teams deploying several times a day, or even a week will realistically only look at the CI output that will contain those warnings when there’s a non-zero exit code and it fails, if your warning results in a 0 exit code then it’ll probably never be seen
The problem with this as a process is that it’s all too easy to be subjective and emotive than led by real security threats.
In the more extreme circumstances I’ve first hand witnessed in the weekly review board of a large well respected organisation teams all getting the same person who had been deemed ‘trusted’ by the board to present their designs and justification for security exemptions.
To be clear, he was doing nothing more than reading their slide deck, he didn’t influence or endorse the designs at all, it was just his voice and face that the board had learned to trust.
He was too nice of a guy to say ‘no’.
That hopefully may seem far-fetched and unrelatable for your organization and hopefully it is, but case-by-case exemptions lead to precedent being cited, and it’s really hard to undo and maintain that position of supporting mutations of the policy reliably moving forward alongside the baseline policy that you’ll overtime find few are actually using.
“Exemptions shouldn’t exist for people to just do their job.”
Well, that’s the ideal isn’t it, but I can hear you all back in the real world screaming at your screens pointing at your current estate needs; I will concede, some of those may be legitimate, so what do we do about that?
For example some off the shelf product needs to run as root and you’ve decided that you’d like to run their vanilla and supported product as is, focus on quick patching from their upstream product, rather than building your own image, keeping up with patching and testing it, and then supporting it yourself – great choice.
You’ve however also made a best practice decision that containers shouldn’t run as root. Good on you, that’s also a great best practice to instil in your developers, but why? And most importantly do your developers know that too?
Ok, so we can see the consequences of and rationale of the best practice and we can determine what is often referred to as the ‘blast radius’ or in other words the amount of damage one could cause by exploiting that single avenue, including taking into account any further ‘lateral movement’ (next steps) they can take after that.
So who gets to decide if the risk is acceptable in this case?
And do you need to walk through that approval ceremony everytime you update the package? Yes and you’ll be slow to patch, no and you’ll potentially hasten in an exploit.
Why are you even considering making all these exemptions while at the same time hurting all your developers, effectively sending a message that you trust this external vendor more than your own staff.
Some would argue that setting cluster policy is at the point of execution, the last protection as it were. When we actually look at the mechanics, the checks are run on the controlplane, the worker node and container runtime is not checking anything, it is assuming the api server is ‘ok’ with it by means of static analysis so it can then be trusted further down the chain, wow, you couldn’t get further from zero trust could you
Fortunately the linux kernel (the same marvel that brings us containers) provides a few capabilities for this
seccomp (short for secure computing mode) is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a “secure” state where it can only make limited system calls. Should it attempt any other system calls, the kernel will terminate the process.from https://en.wikipedia.org/wiki/Seccomp
AppArmor (“Application Armor”) is a Linux kernel security module that allows the system administrator to restrict programs’ capabilities with per-program profiles. Profiles can allow capabilities like network access, raw socket access, and the permission to read, write, or execute files on matching paths. AppArmor supplements the traditional Unix discretionary access control (DAC) model by providing mandatory access control (MAC).from https://en.wikipedia.org/wiki/AppArmor
SELinux is a set of kernel modifications and user-space tools that have been added to various Linux distributions. Its architecture strives to separate enforcement of security decisions from the security policy, and streamlines the amount of software involved with security policy enforcement.from https://en.wikipedia.org/wiki/Security-Enhanced_Linux
However, managing them is not easy, so unsurprisingly lots of commercial products have entered the space with all sorts of buzzwords like ‘artificial intelligence’ and ‘machine learning’.
Relatively recently a Kubernetes special interest group has developed the Kubernetes Security Profiles Operator which works to expose the power of seccomp, selinux and apparmor to end users, I’ll follow up with a blog post on this shortly, but in the meantime to wet your appetite you can read my notes How to use the new Security Profiles Operator while I continue to work with th developers on ironing out a simple developer journey for users not on linux workstations.
If you must, feel free to use the PSP migration tool. As discussed, in order to simplify running workloads on Kubernetes cluster, refrain from liberal use of pod based cluster enforced policy and carry out as many of those checks using CI/CD pipelines as the priority before looking to cluster enforced policy