Google’s Kubernetes Engine is easy to get going with, but requires additional security controls. The documentation is hard to grasp as there are many features and changes that tie to specific Kubernetes versions that will require using beta feature enablement, as well as some out of date documentation that can catch people out.
There is also an outstanding bug that the Appvia team have raised on Kubernetes that is hit by GKE. This is an edge case situation if you end up running Pod’s directly and not Deployments and only happens if you enable pod security policies, (which we recommend doing for security reasons).
If you are looking at production and have sensitive workloads, we advise that you implement the steps below.
So what’s the problem with securing your Kubernetes clusters?
When you deploy a default Google Kubernetes Engine cluster
gcloud container clusters create <NAME_OF_YOUR_CLUSTER>
with no additional options provided, you’ll get some sensible security defaults, these are:
The default command would be:
So why did google enable these things by default? Essentially it means:
The things you need to make sure you enable:
1. Enable secure boot to validate the authenticity of the Operating System and Kernel Modules. (If you decided not to use the default OS, then it might cause issues)
Using the virtual trusted platform module, VTPM to sign kernel images and the operating system, means that authenticity can be established. It also guarantees that nothing has been tampered with or kernel modules being replaced with ones containing malware or rootkits inside.
2. Enable intranode visibility so you can see the data flowing between pods and nodes
Making sure that all traffic is logged and tracked between pods and nodes will help you identify any potential risks that may arise later on. This isn’t necessarily something you need to do for development, but, something you should do for production.
3. Put the Kubernetes API on a private network
4. Put the node pool on a private network
5. Provide a list of authorised networks that should be allowed to talk to the API
Making your nodes and Kubernetes API private means it isn’t subject to network scans that are happening all the time by bots, hackers and script kiddies.
Putting it behind an internal network that you can only access via a VPN is also good, however, this is a much more involved process with GKE and isn’t as simple as a feature flag like the others. Read more about what is involved here.
Again, Google enables some default features to get you started, however, there are still gaps that you will need to fill.
What do you get?
What you will want to enable:
- Pod Security Policies. This is Beta and requires you to inform Google that you’re wanting to run the cluster creation in beta mode:
gcloud beta container clusters create <YOUR_CLUSTER> --enable-pod-security-policy
- Network Policies
gcloud beta container clusters create <YOUR_CLUSTER> --enable-network-policy
This sounds great, but what does it actually mean? Basically, some of these roles are split into two, one is for application development teams to own and the other is for the cluster administrator:
Kubernetes can be a bit of a learning curve, there are technologies that make it simpler in terms of dependencies such as helm, that will allow you to deploy application dependencies with pre-defined deployments. But there is no real substitute from understanding the main components of Kubernetes; Network Policies, Ingress, Certificate Management, Deployment, Configmaps, Secrets and Service resources.
The main security components are network policies, secrets and certificate management. Network policies will allow you to control the traffic to and from your applications. Secrets are base64 encoded, so there is no real security in terms of how they are stored; therefore making sure the cluster administrator has enabled secret encryption, (as mentioned further down), will add that additional layer.
Certificate management will make sure the traffic to your service is encrypted, but if you’re communicating between services, then you should also add TLS between your applications. Having the cluster administrator install something like cert manager, will allow an easier way to encrypt between services. There are also services like Istio, but as that product does a lot more than just certificates, it can add more complexity than necessary.
You want to make sure that development teams can’t deploy insecure applications or make attempts to escalate their privilege, mount in devices they shouldn’t or make unnecessary kernel system calls. Pod security policies offer a way to restrict how users, teams or service accounts can deploy applications into the cluster, enforcing a good security posture.
Role Based Access Controls and Pod Security Policies go hand in hand. Once you define a good pod security policy, you then have to create a role that references it and then bind a user, group and/or service account to it, either cluster wide or at a namespace level.
Things to note: GKE uses a webhook for RBAC that will bypass Kubernetes first. This means that if you are an administrator inside of Google Cloud Identity Access Management (IAM), it will always make you a cluster admin, so you could recover from accidental lock-outs. This is abstracted away inside the control plane and is managed by GKE itself.
We recommend the below to be a good PSP. This will make sure that users, service accounts or groups can only deploy containers that meet the criteria below. This means that if anyone or thing that is bound to this policy, will be restricted on the options they can provide to their running applications.
#Define seccomp policies to only allow default kernel system calls
# Prevent containers running as privilege on the node
# Prevent containers escalating privilege
# Prevent using the host process network namespace
# Don’t allow containers to run as root users inside of them
# Don’t allow containers to have privileged capabilities on files, allowing
# files to potentially run as root on execution or run as a privileged user
# Only allow the following volumes
If you wanted to create a role to use the defined PSP above, then it would look like something below, this is a cluster role as opposed to a standard role. To then make this enforced on say all authenticated users, you would then create a role binding to apply to the “system:authenticated” group.
# Role linking to the defined PSP's above
# Role binding to enforce PSP’s on all authenticated users
- kind: Group
Remember that as this is cluster wide, any applications that may need more privileged permissions will stop working, some of these will be things Google add into kubernetes; such as kube-proxy, that runs in the kube-system namespace.
We can break this down into two sections:
The recommendation for encrypting secrets using google clouds KMS service, is to have segregated roles and responsibilities and to define a new google project outside of the google project that will host Kubernetes and applications. This is to make sure the encryption keys and more importantly the key that is signing the other keys, (envelope encryption), isn’t residing in the same project that could potentially get compromised.
For encrypting secrets you need to:
The documentation on how to do this can be found here . But main things to remember are:
Once this is setup you can pass the path to the key, to the command:
gcloud container clusters create;
If any of the above is incorrect, you will get a 500 internal server error when you go to create a cluster. This could be the path is incorrect, the location is wrong or the permissions are not right.
There are four different ways to allow application containers to consume cloud services they might need i.e. Object Storage, Database as a service etc. Inside of Google Cloud. All of these have limitations in some way i.e. being less user friendly and unautomated for developers, (making developers wait for the relevant access to be provisioned) or have less visibility and more complexity in tying together auditability for Kubernetes and cloud administrators.
Less Developer friendly (unless automated) or Introduces Audit Complexity:
1. Workload identity, (this is still in Beta and is the more long-term direction Google are going in):
Require a constant process of managing google IAM as well as Kubernetes Service Accounts to tie them together. This means managing google IAM roles, policies and service accounts for specific application service lines as well as kubernetes service accounts. It does however improve auditability.
2. Google Cloud Service Account keys stores as secrets inside Kubernetes namespaces, (where the application will be living):
Similar to the above, but without the binding. This means provisioning a service account and placing it as a secret inside of the applications namespace for it to consume natively. This has the downside of not having full auditability within Google Cloud and Kubernetes.
3. Use something like Vault to broker between Cloud and Applications.
Using Vault provides an abstraction to cloud and will generate short lived access keys to applications. There is still a secret required to be able to speak to the Vault service however, hence, the permissions are still the same but just abstracted down one. It also disjoints audability between Google Cloud and Kubernetes.
More Developer Friendly:
4. Using the default service account node role of GKE (Google Kubernetes Engine)
Much simpler, but more risky. It would mean allowing applications to use the default node service account and modifying the role to cater for all the service policies applications would need, increasing the scope and capability of the node service account to most Google Cloud services.
All of these have limitations in some way, ie. being less user friendly and unautomated for developers or have less visibility and more complexity in tying together auditability for Kubernetes and cloud administrators:
Note: As of today, there is no access transparency on when Google accesses your Kubernetes cluster (Google Access Transparency). This could be problematic for a lot of organisations who want assurances around the providers data access.
When the cluster is provisioned, all audit logs and monitoring data is being pushed to stackdriver. As the control plane is managed by Google, you don’t get to override or modify the audit format, log or extend it to your own webhook.
It does however mean that you can search your audit logs for things that are happening inside of Kubernetes in one place, for example; to query all events against a cluster inside of a specific google project against your user ID, you can do the below:
gcloud logging read --project <your-project-id> \
'timestamp>="2020-02-20T09:00:00.00Z" AND \
resource.type = k8s_cluster AND \
resource.labels.cluster_name = "<your-cluster-name>" AND \
protoPayload.authenticationInfo.principalEmail = "<your-google-email>"'
From this point on you could take it further and add additional security rules, like setting custom metrics to alert when cluster admin changes are made or specific modifications are happening on roles, (cluster-admin roles).
Security is something everyone wants, but as you can see it can be quite an involved process to achieve. It also requires domain knowledge to get enough context to assess the risk and what it might mean to your applications and business.
Security without automation can slow down productivity. Not enough security can put your business at risk. Enabling security features that require “Beta” feature enablement may also not be suitable for the business and only General Availability features are acceptable, which compromises on security.
As a general rule, hardening your clusters and enforcing secure ways of working with Kubernetes, Containers, Cloud Services and your applications will get you the best outcome in the end. There may be frustrating learning curves, but as things the industry matures, these will slowly be remediated.
Let us know how you have been doing security with Google Kubernetes Engine, Vanilla Kubernetes or other cloud vendor offerings.