How to Avoid Day 2 Kubernetes Problems

Table of Contents

Getting started with Kubernetes (K8s) is just the tip of the iceberg. K8s in its simplest form can be easily managed, but it isn’t quite production ready out-of-the-box; there is a lot of knowledge and work to do to get to that point if you’re planning on managing it in house. Because Kubernetes is fast-moving, with continually evolving best practices, it requires many man hours from specialised engineers that are in high-demand and expensive.  

Day-2 operations in K8s, for anyone who’s unfamiliar, is the time between the initial deployment of a cluster and when it gets replaced with another iteration (or, killed altogether). At the highest level, the process plays out like this: 

Day 0: Designing

Day 1: Deploying (creating Kubernetes itself) 

Day 2: Maintaining

Approaching Day 2 operations

Moving from Day 1 to Day 2 isn’t as simple as it might seem. Think of it like moving any technology out of staging and into production: organisations need to make sure that Day 0 and Day 1 phases are implemented with all of the best practices to lay a strong foundation for Day 2 operations. Day 2 then consists of all of the maintenance, management and monitoring of the Kubernetes platform. 

Quickly, organisations will come to the realisation that self-managed Kubernetes is full of complexities and challenges. You can tackle some of these problems, like scaling and updating, in-house through using a managed Kubernetes service from the cloud providers themselves, such as EKS or GKE. But they still don’t provide a perfect solution to mitigate all potential problems

Day 2 operations are critical in realising the potential benefits of Kubernetes, and the reliability of the environment that has been created. Without effectively managing Day 2 operations, organisations will struggle to scale their environments and put the entire infrastructure in danger. 

In order to avoid critical problems when you get to Day 2, make sure you have coverage on these fronts:  Monitoring, updating, security, networking and scaling.

Monitoring and Logging

Kubernetes itself doesn’t provide any sort of central application monitoring or logging straight out of the box, so if you’re managing Kubernetes in-house you’ll need to adopt a product or solution to solve this problem. 

Cloud Kubernetes Management Services offered by public cloud providers are not comprehensive. While the Kubernetes control plane is managed by the provider, the worker nodes are very much your responsibility. There’s significant operational overhead required to effectively manage monitoring and logging, and Kubernetes administrators need to be ever-present to handle any potential downtime from your logging solution. 

To reduce this overhead in a cost effective way, look for a product that comes already equipped with monitoring. Appvia Wayfinder reduces noise from unnecessary, low-priority alerts so that teams don’t get bogged down in unimportant details. With Kore, you can also configure important alerts to be sent where teams will have the most visibility – like Slack or a team ticketing system – so that incidents are quickly resolved. 

Upgrading

There are tons of tools available to help ease the process of upgrading clusters, each one of them managing upgrades differently. The choice you make around upgrading is essential to making sure there’s no downtime to hosted applications within your cluster.

Cluster administrators need to also factor in version skew support while planning upgrades, to make sure that master components versions will be compatible with each other if a manual upgrade is to be actioned. As each release by Kubernetes may introduce breaking changes on its components, it’s important to identify any potential applications within your cluster which may be affected by these changes, which might require additional configuration changes on your applications prior to commencing an upgrade.

Some cloud managed Kubernetes clusters, such as GKE, mitigate this problem by automatically upgrading master components without any user interaction so that administrators won’t need to worry about it. But organisations managing their own cluster will need to figure out a way to repeatedly upgrade clusters so that Kubernetes admins can consistently upgrade on a schedule.

Security

Making intelligent choices on how to handle security is yet another layer of man days and due diligence required from engineers to ensure that security risks are minimised. And still, as hard as you might try, you’re bound to run into situations that are impossible to predict – and your infrastructure needs to be prepared for anything. It’s a full time job for someone to stay on top of security.

Maintaining those security practices is a costly effort that needs to be constantly re-evaluated and controlled. There are a wide range of security options to consider,  from private or public Kubernetes access to node and workload security access modes and network policies – that require measured choice.

Whichever way you build your Kubernetes infrastructure, there will still be 3rd party services that need to be managed. With the addition of multiple parties, there’s the added security risks and single points of failure (SPoF), creating even more pressure on development teams. 

The more automated, the more secure it is because security has been built in. When utilising any management product, best practices should be intertwined in the entire offering. And, when security isn’t implemented properly at the beginning, organisations make themselves vulnerable by unintentionally exposing their infrastructure to breaches.

Scaling

Without auto scaling, using Cloud is inefficient. You will only be able to pay-as-you-go as demand grows if there’s no auto-scaling capabilities, which produces massive inefficiencies when demand subsides. 

Like with security, there are a number of options for installing and configuring auto scaling, which also requires time and expertise. The option you choose dictates how Kubernetes knows when your application is in demand (a horizontal pod autoscaler). If auto scaling is appropriately configured, you won’t lose data or security measures when scaling down. If not, you will have a host of problems on your hands.

Platforms scale, people don’t.

When companies begin to scale using Kubernetes, they often end up with multiple versions and potential configuration variations; making it difficult to have a clear view of security or best practice consistency throughout all clusters and environments.

Filling in the gaps

Wayfinder is designed to alleviate Day 2 Kubernetes problems by removing the complexities and operational overhead of building your own system. Wayfinder allows you to self-serve Kubernetes, so that you can take advantage of industry standard, best practices right out of the gates. 

What teams experience with Wayfinder:

Monitoring: Alert teams to things that matter, like failing nodes or restarting application pods, and reduce noise from unnecessary alerts to keep teams on track. 

Upgrading: Automatic upgrades with maintenance windows, so that teams have secure, patched, up-to-date clusters

Security: Security best practices, including network policies and access controls, are built-in as well as user and team access management of clusters so that security risks are minimised right from the start.

Scalability: Autoscaling so applications can scale up and down accordingly to meet demand

About Appvia

Appvia enables businesses to solve complex cloud challenges with products and services that make Kubernetes secure, cost-effective and scalable.

Our founders have worked with Kubernetes in highly regulated, highly secure environments since 2016, contributing heavily to innovative projects such as Kops and fully utilizing Kubernetes ahead of the curve. We’ve mastered Kubernetes, and experienced its complexities, so our customers don’t have to. 

Share this article
Twitter
LinkedIn
Facebook
profile-112x112-crop-1 (2)
Vincent Lam
SOLUTIONS ENGINEER

The podcast that takes a lighthearted look at the who, what, when, where, why, how and OMGs of cloud computing

Related insights

Managing Kubernetes Secrets with HashiCorp Vault vs. Azure Key Vault Keeping secrets secure...
Namespaces are a vital feature of Kubernetes. They allow you to separate uniquely named...
DevOps teams have rapidly adopted Kubernetes as the standard way to deploy and...
Once you start working with Kubernetes, it’s natural to think about how you...
Self-service of cloud resources Kubernetes has been brilliant at delivering an ecosystem for...
Pods, deployments, and services are just some of the concepts that you need to understand in...
Last week I published a blog, “How to spot gaps in your Public Cloud...
Breaking down the core areas that you should be aware of when considering...
5 tips to help you manage more with less Not every manager of...
Public cloud has provided huge benefits in getting infrastructure and services to people...
This is the story of how three Appvia Engineers contributed so much to...
Overview The UK Home Office is a large government organisation, whose projects and...