Make Production Fun Again: Interview with Carlos Nunez

Joel and Carlos go WAY back. And, would you believe that Carlos has moved from a Kubernetes hater to full-blown evangelist? Dig into his story and how he advises, and aims, to make production fun again.

To keep the conversation going, join our Slack community and follow along on Twitter


Welcome to Cloud Unplugged, the podcast where we take a light hearted, honest look at the who, what, when, where, why, how and OMGs of cloud computing. In today’s episode an interview with Carlos Nunez, one of my friends, collaborators, former co worker, just an all round good guy. And I think he has a lot of valuable things to add to the conversation around Kubernetes. So with that, let’s get straight into the interview.

So during these episodes with interviews has been a lot of fun for me. It gives me an opportunity to catch up with with people that I’ve known for a long time. And the person I have joining me today definitely fits that category. So I’ve known him for several years, we worked together previously in a consulting way. I have some shared battle scars at a couple of wild and wacky engagements. And he’s one of those people that you meet along the way. That is just extremely intelligent, extremely good at what they do. And somebody you’re like, I want to keep track of this person. So with that, let me introduce Carlos Nunez. 

Hey, well, thanks for the kind words, Joel, I really appreciate it. I’m glad I fit in that category of people. 

Yeah, all all true. You’re in my address book for life.

 You’re in mine as well, so I’m glad we share that! 

Yeah, you’re on my list for life. So you know, just, adjust.

Imagine you meet someone along the way and ou’re like, yeah, I definitely met you you’re on my speed dial. And they’re like, what’s your number again? 

Can you take me off?

Anyway, so we worked together in a consulting mode at a couple a couple of different places. And then these days, you’re doing something that honestly, I would love to chat about at least a little bit. You know, what, what are you up to these days, Carlos? 

Well as my job title states, I’m up to being a Solutions Architect now. But as you very well know, solutions and architect are two very broad words that can mean a lot of things. 

Oh boy, can they.

Yeah, yeah. Soright now, a lot of my work is helping teams get to production. So that’s why I’ve been doing for the last few years. And when I say that, what I mean is helping teams, you know, it means a few things. The first is it means helping teams be less afraid of production, which usually means being able to test our applications locally, in a way that when they get their application to production, they’re not as worried about stuff breaking. But also understanding the platforms underneath production, the software running in production, the dependencies that they may be dependent on, before they can get to production. Really understanding that entire chain and the entire life cycle that their application gets released onto and then usually writing it in code somehow. 

So that normally means in my job pairing with an engineer, one of my clients, one or more engineers, and teaching them some new tools, teaching them some new tricks, teaching them how to own platforms, or how to better understand their platform, as well as how to just apply modern software development practices. So things like writing unit tests or optimising unit tests, refactoring unit tests, so that they don’t just assert that true is true, or do very little value. Or helping them write tests that are fast, so that they can write more unit tests, helping them understand integration tests and component testing.

You know, helping them with some refactoring patterns, that sort of stuff. So it’s an all encompassing job, and it usually involves something with Kubernetes these days. Which I think brings us to the meat of the episode. 

Yeah, the penny just dropped the potentially bad penny in some cases. Yeah, definitely. I wanted to talk to you about Kubernetes. And specifically, your experience with it. Because both of us have been around Kubernetes for what feels like ages at this point, going back to extremely early days of of the project. And it’s been fascinating to watch it mature. It’s also been fascinating to watch what’s happened with it, as it’s gone from something that was relatively experimental, to something that was really becoming very mainstream and widely adopted, to something that now is starting to feel very commodity.

And it’s been interesting to watch the path of that this particular project from start to finish. So with that in mind, you know, how did you get to Kubernetes? What was your first exposure? And I really I would, I would love to know, your initial reactions when you first started messing with it and getting accustomed to it. You know, what did you think? 

It’s almost like you’re asking the question knowing kind of what I’m gonna say. But I actually haven’t tracked humanity since its inception, though I wish I did. Because I was looking at the earlier commits from Joe Beda. And Craig McLuckie, and those folks and it was really interesting how it developed, like how it matured. Something I didn’t realise was that in the early days Kubernetes was the single binary and all the components that make Kubernetes, the schedule, or the proxy, the controller manager, all that was a single executable. And seeing the thoughts diverged from having all been all in one package to being several different components and then seeing other teams. Like I  think the team at Rancher and some other independent developers like bringing it all back into one binary to be optimised. I think, seeing history repeat itself there has been really cool. 

But I started using Kubernetes in I want to say 2018 or 2019. And when I say I’ve been using it, it hasn’t been in the enterprise kind of ways that I’ve been using it with clients these days. When I first used it, I was trying to do the ‘Kubernetes the Hard Way’ tutorial or a series of laughs from Kelsey Hightower. And I did it just to get a better understanding of, what is this Kubernetes thing that people keep complaining about, or people keep logging on about? And so my first exposure was to was through setting up a cluster with Kubernetes the hard way from scratch. And what I learned very quickly, is that it has got awfully complicated. It has all the moving parts, literally all of them. There is so much going on. It’s sometimes distracted away from you, if you’re using like EkS or GKE or something like that.

“I didn’t get that” 

Nobody asked you, Siri.

Yeah, no one…she’d probably give the wrong answer anyway. But yeah, there’s a lot of moving parts. And the schema for submitting, creating resources in Kubernetes, managing them, that entire whole block of YAML can be really intimidating. So for me, my first exposure was doing that and then getting turned off by it. Because I just didn’t want to. I was using Docker Compose, had been using it for years up to this point, and going from Compose to THAT was not something I really wanted to do. Until I joined a consultancy in 2020 where I had been asking to do more Kubernetes work just to further solidify my hate or my dislike for it. I wouldn’t say hatred, that was resolved for something else. My dislike for it. And I was put onto a client who was using Azure Kubernetes Service (AKS) pretty extensively. And that’s where I got to really see the benefits of Kubernetes, especially pertaining teams teams who are 100% working with Docker and containers. And even though I’ve been using Compose for a long time, I’d use Swarm and Docker data centre, when that was the thing. Even though I’d use these tools, really seeing what Kubernetes can do with orchestrating containers and treating it like a data centre. That was really, really, really cool. And it clicked for me that all the complexity that I was seeing made sense, and there are reasons for it. And that there have been attempts to make that logic simpler, but usually those attempts come with compromises that you may or may not see later down the line, and you might get burned on. In the case of Docker data centre, it definitely happened to one of my teams. But yeah, I really grew to appreciate Kubernetes and now at VMware, I’m working on a product suite called Tanzu which is entirely around Kubernetes and the Kubernetes ecosystem and trying to help teams envelop modern development practices through using Kubernetes as the destination for those applications. And getting to work with some of the founders of Kubernetes project and getting with some people who are heavily involved. Getting to work with it on a day to day basis in a deep way has only solidified my appreciation for the project and the people that work on it. So, yeah, that’s my long story short with Kubernetes. But I have been using Docker for many, many years. Since like 2017 or something like that. 

Yeah, we’ll come back to Docker Data Center in a second because something you said was pretty interesting. But I would say from your early experiences with Kubernetes, obviously, you  first evaluated it looked at it said, Oh, my gosh, this is a bag of rabid cats. What the heck am I doing with this? And then at a certain point,  something clicked. And you saw the value. What was the breakthrough? What really made it click for you? I think the listeners would be interested to know.

So I was writing a post on Reddit about this just yesterday, actually. So I’m glad to actually talk about it again, because I think a lot of people are seeing this now that they are starting to use, you know, Docker containers in production. To explain what the breakthrough was for me to explain a big part of what I do with the teams that I work with. And that’s getting them to run their applications in containers, right. And the reason why I spend so much time on that is because of what containers aim to solve. So the idea behind containers is that the not only application, not only can you run the application in any environment that can run your container runtime, but you can also impact and capture, encapsulate that applications, environment and configuration. That way, you’re not spending time, you know, writing chef cookbooks or Ansible playbooks or something like this to try and set up this pristine capital P production environment that cannot change, and can only be changed by the deities that have the power to do so. And then hoping that your application runs on that. Instead, you can have the entirety of what the application needs, and only that run in a single image, and then have your container runtime, run that image, create a container from that image anywhere. And I spend a lot of time on that, because a lot of the teams that I work with, their main concern is their main concern as well production isn’t is you know, I don’t know what production looks like. Or if I do, you know, I can’t set up that stuff on my computer. So on my computer, I’m going to maintain this local development environment. And I’m going to write documentation on how to set it up. Or that’s actually usually a luxury, sometimes someone sets it up, and then you know, they have it running. And then there’s no time to read the documentation. So there’sthis long training period, hoping the person that knows how to set up pairs with the right person to set it up. But having this bespoke local development environment that’s hard to replicate and doesn’t mirror at all what the application’s environment is going to look like. So I spend a lot of time trying to fix that by saying, What does production look like? And let’s get that thing into Docker. And let’s get your application running in it. So your tests run in something that looks like production, and your application runs in something that looks like production on your laptop. One colleague that I worked with, the way he expressed this was I’m gonna sit down on on this chair, in front of your computer, and I’m going to ask, How do I ship this app into production? How do I how do I cut a new release? Not knowing anything about your team. anything about your, your application, anything around anything. How do I get this into production? And getting that into Docker, helping teams really do that exercise has really been able to help them be able to answer that question, right? So for me, the big breakthrough was that Kubernetes, or container orchestration, is almost really similar in the sense that you can have a production like platform running on your laptop, you can deploy that thing you can deploy to Kubernetes on your laptop. And that same Kubernetes, of course with some additional extensions, we can talk about it. Kubernetes on your laptop, or Kubernetes in someone’s computer, or Kubernetes in AWS or Kubernetes on prem in bare metal. All of that is the same. And, I guess the real big breakthrough for me there was when I started using Kind, which is Kubernetes and Docker, and seeing just how easy it was to get Kubernetes set up that way, and then being able to deploy applications into Kubernetes. Just like I will through CI. Once I saw that, and once I saw that, okay, I have like a production-like data centre OS thing running on my laptop that that I can that I can push my application into, that’s going to be the same thing as what’s going to be in prod. For me it just makes the friction of getting applications in into prod easier. And anything that can help with that helps my mission of helping make production fun again, for teams. So that was really a debate that was really like the big, Oh, my God, this is awesome moment. And why I’m so excited to work with it.

So I kind of a follow up question to that is, so that was the aha moment for you. And I think it’s probably the aha moment for a lot of people, you know, specifically with containers and Kubernetes is the the repeatability of processes, it just makes everything, you know, all the, the if then that we used to have in ci pipelines, you know, if this environment, then this, you know. That all kind of goes away, and everything becomes a much more linear paved road to where you want to get. The interesting thing there is do you if you had to explain beyond that, like, once you get to production, once you have something running in Kubernetes, if you had to explain to somebody, this is what it actually gets you, you know, this is the killer feature. Right? Once it’s got your workload and says, Roger, Roger, it’s running. Right? What is that? I mean, how would you explain that to, you know, sort of, since you mentioned, Reddit, explained like I’m five.

Yeah. So I think the big, like, killer feature that Kubernetes provides, is around how do I encapsulate this in like one word. Actually I’ll talk about one of the killer features, because I was trying to figure out what word to use to describe all of them but it’s to just position them. Networking. So with Kubernetes, you can create services that allow you to discover the applications that are running inside of your clusters. And by the way, those applications can be partitioned through namespaces, which already is kind of a challenge in  normal, non Kubernetes environments, trying to establish tenancy or hard tenancy like that is pretty difficult. 

Yes, segregation of concerns or workloads.

Precisely. And so networking is something that Kubernetes kind of gives you a lot of benefits for free, right, so you get service discovery, or at least basic service discovery, for free. And normally, you’d have to instal Console, or instal some service like that, in order to be able to get that. And then and then you know, as you know, once you start installing more tools, that means more overhead, more process, more things that can break. So you get service discovery, for free, or at least a basic service discovery. And then you get layer seven load pattern rule based rule based load balancing for free right, through Ingress. And that’s normally something that if you’re not working with Kubernetes, I know that there’s Ingress controllers that work with the application load balancers or Azure gateway, Ingress gateway, or at Ajax, gateway Ingress controllers, right? I know that those exist, but the fact that you can create paths and then route them to different services that you have that you maintain. And that just comes with Kubernetes. Instead of having something that you need to like, you need to have like some sort of like an application gateway, that you need to install manage. Like, I don’t know, what’s the one that’s escaping me like Apogee or something like that? You can just use Ingress and then use an Ingress controller, and then you’re kind of good, right? For, I’d say like 80% of the workloads that I normally work with. 

What I’m trying to get at here is that there are a lot of features within Kubernetes that would normally require different tools, and different processes for managing in a non Kubernetes environment. And the fact that you can have all of that be wrapped in means that you’re stuck, you stay in a single ecosystem that it’s easier to train people on, it’s easier to get people to use. It’s easier to manage, and it’s easier to contain it all. And I think that’s something that for me is a huge benefit that Kubernetes brings.

Yeah, I think if there’s a killer feature of Kubernetes it’s something that kind of flies under the radar. It’s a design assumption that’s baked into Kubernetes. And the design assumption is that everything is dynamic. And that flies in the face of so many things that we thought for years about how to configure secure systems, that was you’d locked it down, nothing moved, you know, dynamicism was sort of the opposite of what we were going for. And Kubernetes says, No, everything is dynamic, and everything is configured basically on the fly. And that is such a breath of fresh air, when you think about what people are actually trying to do with applications being, you know, very, very nimble, being able to make changes on the fly, abstract away some of the low level complexity, being able to scale up and scale down really, really quick. All of that stuff. To be able to have the entire environment, the entire application environment, sort of flex and reconfigure based on request very dynamically is, frankly, staggering. It still amazes me that it works to be honest with you. It’s, it’s pretty amazing. 

And I think you touched on something else that I really enjoy about Kubernetes that is more difficult to implement in a non Kubernetes work space. The fact that you can just scale up scale down, through deployments, and just run a single command  and all of a sudden you have like 10, replicas, or none. And you can write metrics that will automatically do that auto scaling for you. I just think that, getting that for free, regardless of where your Kubernetes is running, I think that’s super cool.

Yeah, I mean it’s really, really amazing. I will say this, though, I know that you know, the answer to this question so it’s a little bit leading. But, you know, what about the application that’s running in the container. We talk about being able to sort of rapidly scale up and scale down, but what do we have to do to the application? You know, if I’m, Joe ‘business lead’ at a customer, what is it? Are we going to have to do anything to the to the application? Can it stay as is? Can we just take what we have and shove it in a container and give it to Kubernetes and sort of profit.

I mean, it’s funny, because, in a lot of cases, lately, and I got to preface it by saying that lately, I’ve had a good stroke of luck. A lot of the applications I have been working with the you could just stick in a container, and it just works, which is kind of surprising, not only for me, but also for the teams that I’m working with. And that they think that you know, oh my god, I have to move into Docker, it’s gonna be this huge thing. And I’m like, well, let’s just write a Docker file and try it out. And then all of a sudden, you do a Docker run, and the application just works. They’re like, Wait, what? I thought this was gonna be hard. But generally speaking, that’s where the refactoring efforts really come in, right? Because not every application can just understand to be compatible with 10 copies of itself running, right? Like if you have like, sticky sessions happening, if it’s a web application, or if there’s some sort of state that the application persists, that depends on one copy of the application reading from that state at any given time. You know, do you know you don’t want to introduce race conditions like that, where you have 10 different replicas of an app, and they’re all trying to read from the same thing. And then, you know, we have versions of the app that are giving you stale data. Data is one of those things that is a challenge in general, just because I will say one slight disadvantage of Kubernetes. That is, I caution to say that it’s a disadvantage, even though I think it’s a good opinion, and it’s a good best practice. One slight disadvantage is that Kubernetes assumes that your workloads are stateless by default. Right? So even though it has support for persistent volumes, and support for staple sets and all these things. By default, Kubernetes would prefer that your application be stateless. So that way, when your pod gets killed and gets recreated somewhere else, there’s no complaints of Oh, but the system that I’m on has a thing, right?

I love that you said prefer, like, it’s like, I really, really like it to be stable. 

Yeah I mean it’s a kind request, right? Because at the end of the dayKubernetes aims to be able to run any application of any kind stateful or not. But stateful applications are stateful. And you have to deal with that state somehow. And that usually requires some hard dependencies being created, that can be difficult to manage in a distributed system like this. Right. And I think there’s actually one … Go ahead, sorry. 

I was going to say I think that gets back to what we were talking about before of you know, the the overall opinion of Kubernetes is wants everything to be very dynamic and configured on the fly. And as soon as you introduce state, you have to hamstring some of that dynamicism. Because you just can’t. Some things that it wants to do and the way that it was kind of designed or the opinions that went into it want it to operate, you have to short circuit a few of those mechanisms, because as soon as states in play, then things become a bit more rigid than it would, to your point, would prefer.

Right. Right. And I mean, I think that’s one of the complaints that people have when they use Kubernetes is like, you know, I have a database. And, I’m trying to run databases in Kubernetes and it’s not working out what I want it to. Well, Kubernetes can do that you can run databases, but again, it comes back to state like managing that state can be really tricky. And,  the good thing is that Kubernetes does have resources to handle it. It’s just, like you said, you’re always going to hamstring your ability to be flexible and dynamic, if you have to deal with if you have to deal with state.

So if you are dealing with state, what are some of the strategies to be able to have state or deal with it in an elegant way, so that you’re maximising what Kubernetes can really get you? You know, what are the what are the mitigation strategies or the architecture patterns?

I will say, this is something I’m not as an I’m not an expert in, and I’m still learning more about this. So I probably won’t give the authoritative set of answers on this topic. However, what I will say in terms of getting your system to work on Kubernetes is leveraging the resources that are available for those things. For example, if your application depends on some files existing on a file system, for example, like maybe some applications use file systems or the cache, well we’ll get into caches, but some applications are using the file system as a cache or something or using it for stateful logs. If you’re doing that, then you’re going to want to start looking into persistent volumes and persistent volume claims and storage classes and stuff like that. 

Yeah, explain, like, I’m five again. You know, so when we talk about volumes and PVCs, what what is that exactly?

Yeah. So normally applications assume that the file system is local to them. And that the and that it’s on the same machine as the application that’s running on, if they have some notion of state. The problem with that, though, is that your application, or the pod that’s running your application, I know, ‘let’s explain like I’m six’, but the container that’s running your application can live on any host. In a Kubernetes cluster, you’re gonna have more than one host that’s able to accept your your container, the application, the container that’s running in it, right?

Right. And the scheduler is going to make a decision when you ask it to of where it should be placed. And it’s going to place it there. And you won’t necessarily know in advance where it’s going to happen.

Precicely, and you don’t have any control of that. I mean, you do, you can create, you can even add annotations and stuff to control, like we’re in taints and tolerations, to control where that goes. But ignoring that, because generally speaking if you’re doing that, that’s a smell or something. 

Right, all things being equal though. If you’re treating all nodes as equals, they’re just gonna go, here’s where I’m putting it.

Right, precisely, which means if you store stuff locally, and you’re depending on local storage, that’s a that’s a big problem for you, right? 


So you want to be able to use if you have to use state, you’re going to want to use something that isn’t tied down to a specific system. So that means using something like persistent volumes and persistent volume claims that can store your state somewhere that’s more distributed more dynamic. That way, or even if we’re not talking about Kubernetes, normally the recommendation would be okay, well, instead of using a local file system, let’s use something like S3 or Blob Storage, or some kind of distributed file system. That way, we don’t have to worry about you losing your stuff if the host goes down. You know, there’s more resiliency, more redundancy there. Let’s go ahead and store it there. But generally speaking, that goes back to my cash and logs example, that I mentioned earlier. Picking the right kind of storage is important as well, right? So I worked with an application that was using the file system as a cache, but the file system, even though it can cache stuff, it’s not the ideal place to hold the cache. You know, this cache on system thread is like, yeah, you know, Redis, mem cache, like there are systems that can do this better than the file system can and it usually involves less code. 

So picking the right tool for the right kind of data that you’re going to be managing, or picking the right system for the data you’re going to be managing is important. But I know that this is a conversation of so if I have state, how do I deal with it, but I feel like it no conversation like this is complete without suggesting 12 factor. So if you’re hearing this for the first time, and you’re not sure what 12 factor is, the 12 factor app is an ideology of making an application that works on distributed systems like Kubernetes, where you don’t have to worry about being dependent on the host that the application is running in or deport hard dependencies like this, the application can just die and come back. It can scale itself infinitely and the application will just behave like you expect it to. So the 12 factor  ideology is a way of getting to that point. And one of the things that 12 factor explicitly talks about is having labs be event streams, instead of logs being a file. And in a lot of legacy, and I won’t even say legacy applications because when I see a lot of applications, even applications written today in 2021 still assume a log as a file on a system. And the problem with that is if you depend on that log for anything, you’re either using some kind of agent that has a slope that log up and sip it off somewhere, or if the host again dies and that analogue isn’t available anymore, then you lose a bunch of analytics about your application a lot behaviour of your application. 

Or even worse, if it’s containerised and you didn’t know about this behaviour before and the container exits, bye-bye logs. 

Precicely. Bye-bye logs. And that’s never a fun time, especially in during downtime. You try and figure out, why did my application crash? And then you realise, wait, wait, I just lost all my logs? Because there’s a container that forgot to amount to volume.

Yeah, you absolutely have no clue what happened.

Precisely. So when you treat your laws like an event stream and you log it standard out, one of the things that you get with that is the ability for logs to be stored elsewhere. And for the logs to be captured in an easier way. Because especially if your application is going to run a container on Kubernetes or Nomad or anything like this, you’re going to want to have something that can just read those logs from standard out and then ship them off somewhere else. And there’s no dependencies, at least from a state perspective on the system that’s running on, which is nice when your pod gets killed by the scheduler and moved elsewhere. It’s a real benefit there. So logging tree laws as event streams is really important. But I would I guess the general thing I’m trying to say here is that when it comes to state, really consider whether your application needs good state, and if so, try to leverage the tools that Kubernetes provides for that. Just if you’re going to go on Kubernetes it’s an easier time for you and easier time for the developers are going to be maintaining it.

Yeah. 100% and you know, my soccer recommendation, I mean, data is slightly different problem which talk about in a second. But you know, if we’re just talking about you know, application state, my recommendation to people for years has been look really consider externalising anything that holds state away from the application logic. Use something external to the actual application component to hold state. That way, if you scale it up, you scale it down state still, you know, being held somewhere external. And you don’t have to have that that tight coupling, you can treat them as separate objects. But with data that gets even more tricky because, especially if you’re coming from let’s say, a legacy application where maybe data driven or domain driven design was not really a thought process, when things were designed. And you have data sets that are extremely commingled meaning, you’ve got things crossed up between different concerns all in one big data set, that leads to big databases and big trouble in Kubernetes town, right? 

Yep. Yeah, for sure. 

I would love your opinion on this. But one thing that I’ve seen work really well over over time is, you know, you have to apply some some logic and some tooling to start to break those data sets up to follow, you know, basically start to shrink the scope of the data down because if scope, the scope of the data is just huge and it’s monolithic, then your options are limited. But if you can start to carve it up into domains and separate the concerns and start to shrink the scope of any individual data problem, then more options are open to you. And there’s an entire class of tools that really has grown up around the Kubernetes ecosystem over the years, that are very, very lightweight databases that are intended to be run in a very dynamic environment. But the conceit of it is that you have relatively contained datasets. So that the stuff that I’m thinking of is like Cockroach DB. 


So I mean, have you have you had any experience with with some of those patterns or approaches? 

I’ve had some experience around around data and data management. Now, again, that’s not my forte, books have been written about this courses are available on this, it’s a huge field. But I think, one, one thing that I will talk about is, to your point, having data domains and being able to rationalise which data goes where, I think a lot of legacy applications don’t really think about how the data needs to be structured, in the sense that, I was reading this book, I cannot remember the author of the for the life of me, but if I do, I will send you a note and you can include it. 

Yeah, we can, we can tweet it out. And we’ll also tweet out a link to the 12, faster spec when the episode goes up. 

Awesome. It’s a really great book about database design. And the one thing that really stuck out to me was that the author of this book spends several chapters not even mentioning database engines at all, they talk about how to structure your data.

Ah, that’s good.

And one thing that was even more shocking to me was that one of the exercises that this author was suggesting is when you’re trying to plan a database, you sit down with the team that you’re going to plan the database with and you actually with a pen and paper, ask them a series of questions about what the data is, how it’s related to each other, ywho uses the data? Who’s who’s creating the data? You basically audit the lifecycle of this data: how does it get born? How does it get used? How does it die? And then you essentially create a table around it, and you create the relationships around it. And he emphasises a lot, spending a lot of time on this before even trying to create a schema. And the reason why is because relationships are a huge, huge problem. And what I think a lot of application developers realise, as your application begins to mature, and as the need as the data for the application changes, I think what developers realise is that, these relationships that need to exist between sets of data can get really, really complicated. And if you don’t account for that as early as possible, when you’re designing the data, it can lead to, like you said, huge databases, and huge amounts of data. And, you know, oh, now we have to put our stuff in containers and move them to Kubernetes. Because it’s the corporate mandate, essentially, and that means we need to do something about all this data. And that gets really challenging. So that’s one thing I’ll say about really being able to look at your data and really rationalise the data and determine, you know, how is data related to each other? And how is it going to be used? That’s something that I think, if teams have the time to do is really worthwhile. 

And again, the book that I’ll provide that link has a lot of good exercises for how to do that. But another thing I want to talk about is queries, hand written queries. A lot of  applications hand write their own queries, which is great if you know the database that you’re using well. 


And, you know where the data is, how its structured, and how to access it. But a lot of developers aren’t database administrators, that’s not their specialty. And so writing select star queries all over the place, for data that can grow to be potentially huge. It can be a problem, just in general. But it can be a problem in a containerized environment for many reasons, but one of the reasons that’s specific to Kubernetes is that Kubernetes has a notion of liveliness and readiness. And if you’re using queries at all to determine whether the application is ready, or even when the application is, you know, rate is live, and you then that query just takes a long time. In some instances, your application won’t even start because it’ll take too long to respond to Kubernetes asking, Hey, are you ready, and then your application never starts and ou’re wondering why is this happening, I don’t understand. Kubernetes is hard, there’s too many moving parts. All of that stuff happens, right. So I would say, working with DBAs, to write structured queries and, barring that, using ORMs that are good at being able to create models around the types of queries you need to make. Those are generally good ideas. If your team hasn’t done it, look into it, that’s something I would generally recommend.

Yeah, I’m not a data architect, either. But I definitely have the scars of dealing with the lack of a data architect and a lot of cases. And the the bit that you touched on about relationships is absolutely spot on. The other interesting aspect of relationships is the relationships that you have, or that are defined today make sense today, but in a couple iterations, or a couple generations of the product, those relationships should be evolving along with the needs of the system. And the need that it’s filling. And sometimes the relationships remain locked and static. And then that’s kind of what I’ve seen: the relationships never really change, but the needs change. And rather than alter the relationships, they start to look at stuff like store procedures or things to sort of like tap dance around, you know, sort of relationship structure that doesn’t quite work anymore. And that just compounds the problem. And off we go. So it’s fun, not so fun, little knot you might have to untangle.

I think you gave her a bunch of your listeners a little bit of drama, when you mentioned, or trauma, when you mentioned store procedures.

Yeah, there’s Well, I think it’s a it’s a shared trauma that we all have.

To be honest. Yeah. We’ve all fought that battle, somehow, somewhere. Many times.

Yeah, yeah, sure. But I do want to come back to the liveliness and readiness probes because again, if we’re talking about Kubernetes, generally speaking, I think this is another area that most people don’t have a good appreciation for. And it’s, you know, while it may seem simple, on the face of it, it’s actually really powerful and solves a whole set of conditions and problems. Yes, that that app that real applications have So okay, what’s the difference between a liveliness probe and a readiness probe?

That’s a great question, and it comes up all the time. So liveliness is when your application, the way to interpret this is liveliness, is your application saying, Hey, I’m alive, I’m, I exist, right? And, but that doesn’t necessarily mean the application is ready to accept requests from people, right? It just means that my process is working. I’m I’m up and available, you know, I’m, you know, and and, you know, I’m alive. But readiness means that the applications actually ready to take on requests. And that can mean a lot of different things. And I think a lot of teams using Kubernetes, for the first time dealing with readiness, probes and readiness, struggle with this, because, you know, even with the teams, I’m working on my current client, a lot of them are shocked when they spin up a new pod container application in Kubernetes. And 30 seconds later, a minute later, X number of minutes later, the application crashes, and it began to this crashes back off, which is a state where, again, Kubernetes is very resilient. So when your application crashes, by default, criminalities is going to do all they can to get that application started. And so it loops trying to get the application started, but it’s not going to loop immediately. Because you know, maybe there’s some transient error that’s occurring, this brand new application from starting, and so it’s going to add it back off. Every time it restarts kind of like for the networking folks out there with TCP and retransmissions. Right? So crashes back off actually mean something. That’s what it means for people who haven’t, who seen it, but don’t know what it is. But it’s going to crash trying to get this application ready. But the application is never going to start. Because it never gets ready in time for it to start, or it doesn’t get ready before Kubernetes thinks it’s going to be ready. And going back to what I mean by readiness, readiness can mean it can it can, in some instances, readiness is the same as liveliness, your application started, and you know, I do a simple thing. And as soon as soon as I started, I’m ready to take on requests. But in a lot of cases, it usually means Hey, is my database, the database that I that I need to use to retrieve and store data is that actually reachable is my third downstream dependencies reachable is, you know, have are the sanity checks passing, right? A bunch of these little smoke tests that you can run. Initially when your application starts to determine what If not only the application itself is writing, but its dependencies and its connection to its dependencies already as well. And sometimes that can take a long time. So for those listening, and you know, encountering problems of readiness, you know, one thing I like to do is, and again, this is one of the major reasons why I like using Docker for local development. And you know, really getting that application running in a container, one of the things I like to do is just measure locally how long it takes for the application to start. And sometimes people are surprised, and they think their application starts immediately. But it actually takes two minutes, especially if it’s like a Spring Boot application or something like that, or WebLogic application or some other very, very heavy middleware needs to start, it could take a long time. Time it, and then add a little bit of padding to your readiness to your readiness probe. The period, the radius pro period, and then just set it to that. And if the startup time is really bothering you, then that’s a good incentive to refactor, to look into your application and profile it and figure out why is it taking so long to start? What am I doing that I can control to minimise that time? 

Yeah, you just you just bulls-eyed what I hope you were what I hoped you were gonna say, which is that is that. If the if this startup time is so long, that it’s outside the acceptable norm. Okay, right. Well, that there is a fix for that. And it’s called refactoring.

It’s called investing the time. Right, and, you know, some time. and I think this goes to the general notion about adopting containers adopting Kubernetes, adopting modern application development practices. It’s an investment, it’s not just, Hey, I bought, you know, I bought Rancher, or, hey, I bought AKS, or what have you, I have my applications running on it. And now I’m modernised right? Like, that’s like the start of it. But you know, do you discover that your applications are running in Kubernetes still take really long time to star and they still fail in unpredictable ways. And, and all these things like you haven’t really invested the time in to fix before you went into containers before you went to Kubernetes. If these problems are still happening, they’re going to happen in Kubernetes. And it probably happened even worse, because now you have this system, that’s a distributed system that makes certain assumptions about your application that you didn’t account for in your previous system. And now you have stuff that’s crashing, and you don’t know why now you have data that’s stale, and you don’t know why. And, refactoring and breaking dependencies, and writing and investing in tests. And these sorts of things that you would have had to do, regardless of Kubernetes or not, these are investments of time, in trying to make not only your applications better, but your engineers better your engineering culture better, these things have to happen. And Kubernetes is awesome, but it’s not a silver bullet, it’s not gonna fix those.

So people have asked me over the years, you know, when you talk about cloud native applications, like what, what does that mean? And, you know, there’s a lot of different definitions floating around out there. And, my answer has always been, point them to some of the more generally accepted definitions, but then say, the shortcut is, put it in a container, run it on Kubernetes, and find out because you’ll find out real fast, right? Because it exposes all of the problems  really, really quickly, because. So you do it, you run it, you find one big problem, you fix that problem. Now you find four more and eventually you crawl the tree of all these problems, and you end up with a cloud native application at the end of it. But it is a journey to get there.

Absolutely. It’s always a journey. I mean, it’s funny that a lot of a lot of the teams that I’ve worked with, have had leadership that really does think that buying and, you know, moving to AWS or adopting Kubernetes is like, the way, the thing. 

We’re done now, right? 

Right? And it’s like, and it’s like, I mean, cool, you’re done in the sense that you bought the thing, and the checks have been signed. And, maybe you have application that were lifted and shifted onto it. But I mean, you’re just starting, you’re just getting to the beginning of it. You’re at the starting line now. Right? Like Congratulations, you’re at table stakes now we can actually get started with the real work.


And it’s funny because one of the teams I’m working on right now like they are in this predicament where they need to move their application from one version of Openshift into another and then they need to change the CI, the build tools that are using from Jenkins onto something else and in so doing, the new system is trying to put this is basically establishes opinion of where you’re supposed to have your test environments be ephemeral and your production environment be easily updatable, and it’s not static, it’s a thing that, you know, you can update through a pipeline, and you do it through pro requests, and all that stuff. And a lot of the toil that we’re experiencing is aroundong lived environments, it was around the idea of having these long lived environments that people assume will be up forever. And, in this new system of trying to implement, they’re trying to push the idea that no, like long lived environments that live forever, and are constantly changed and stratify further away from production, it’s just not a good idea, you know, and trying to merge these two, like, the problems that we’re having with this with in this scenario, aren’t really technology driven, they’re more about people and process driven, in the sense that like, always, always, always is, there always is pain over there on Kubernetes. Right, and they have been in you know, now they’re just moving to a newer version of it. And, you know, they’re running in containers, and they’re doing all the right things from a technical perspective,the real problems aren’t that. Like, again, your table stakes, congratulations, you’re in the 20th century, but you’re still like in processing is still stuck in the 18th century or 19th century. You know, that’s where a lot of the work is. And that’s where a lot of the challenges are. And I think it takes leadership that really understands that and really understand is willing to say, Hey, you know, what, like, we can buy whatever we want, but we can’t buy our way out of our way out of broken process, like and really trying to fix that. 

That is the real tough, you know, when we talk about transformation, the transformation that really needs to happen behind some of these technologies, because it’s not just that it’s the processes and the skill sets that go along with it that are the hard thing to move. And you, you’re right, you can’t just write a check for that, right. There’s no substitute for actually doing the work. 


And, and every time I hear people say oh, yeah, we’re running Kubernetes and we have it all hooked into our change management system, and all this stuff. And it always makes me think you know, that scene and Back to the Future 3, where the DeLorean isn’t working the DeLorean Time Machine is being pulled behind a horse. You know what I mean? It’s like, that’s the image that’s always in my head is like, Oh, crap, you hooked up a time machine to a horse, man.

You know, congratulations. That’s real cool. But also, I don’t think you want that.

Yeah, exactly. Like, is that really? Like, is this what you wanted? Is this what you bought?  I don’t think so.

Yeah. And so you know, on my LinkedIn my slogan is Make Production Fun Again, and I really mean that. I think a lot of it is, a lot of the reasons why people fear production and beer releases is because of process. And because of the fear that if production breaks, then you know, all hell breaks loose. And, you know, that’s going to be the fear, regardless of where your applications are running on. I’m almost certain that if I was working with a team who didn’t want to use Kubernetes, and they wanted to stick to using VMs for everything, and they wanted to use configuration managements to deploy everything. I think that as long as they were willing to adopt stain in software engineering practices, , writing tests before writing code, having tests for everything, having fast unit tests, and using a single branch as a source of truth and being able to do everything through through pull requests, or merge requests, and doing code reviews and these things. If they were willing to adopt those things, I bet they would move a lot faster than, you know, teams that just go on to EKS and say, you know, cool, we’re there.

Oh, 100%. And I mean, also, there’s a certain aspect to it of,  we talk a lot about the tools and the technology, because it’s interesting, and the way that the tools get applied in my mind is even more interesting. But at the end of the day, they’re a means to an end. They’re not the goal, right? They’re tools in our toolbox that we have at the ready to be able to solve a problem. But at the end of the day, good products win. 


So, whatever works for you to be able to build really good products, you know, hey, a big thumbs up from me. The result is what matters. How you get there is interesting, but it doesn’t necessarily matter. 

Right? That’s precisely it, man. 

Yeah, so I guess to wrap things up, your experience of being a Kubernetes hater through it being literally a big part of your job now, or really is your job. If you had to do it all over again, and there are a lot of people who really are right at the beginning and are still figuring this out and assessing Hey, is this is this something that we want to get into? Is this the thing? Like if you had to do it all over again, what would you tell you then? What would be the the advice? What would you do differently?

I would have gotten into, I would have actually gone into an environment running Kubernetes in production much sooner than I did. And it’s funny because one of my really good friends he was on he was on a Kubernetes saying, he runs Kubernetes releases now. And, he’s a big part of the Kubernetes machine. I’ve known him forever. And I’ve known him since the start with Kubernetes, which was a while back. He got to tell me stories about teams using it, people using it, his experiences within it. I wasn’t living that, really, instead, I was just like rah rah Docker Compose is the answer to everything or, you know, rah, rah, you know that looks a lot easier. And to this day I still havent used it, so I’ve been used in fortunately, today, if you still if you look for articles, people using it, you’ll find that it’s much, much fewer in comparison to Kubernetes, unfortunately. But I feel like if I had started actually trying to experiment with real workloads on it more more quickly, and really tested my hypothesis of Kubernetes is too hard. And this is just resume driven development. If I had tried to put that to the test earlier, I think I would have seen the light faster, which means that I would have had more experience, to be able to talk about it and evangelise it now. So that’s like one thing I definitely would have done. Again, if I if I had the opportunity,

Well practically speaking, one of the entry points for you was Kubernetes, the hard way, and arguably that still has a lot of value. I mean, if you want to understand the way that the thing works. 


Are there are there specific resources that you would recommend?

Of course. So like you mentioned, Kubernetes The Hard Way. If you want to learn Kubernetes, from scratch, and you want to learn what makes it tick, that is the authoritative source for it. And I would specifically recommend not doing it in GCP like what the docs actually say. I think there’s a lot of value in trying to do it in some other system, in the sense that there’s a lot of pain and trying to get it to work on a system based on those labs. But the pain is useful. And the pain is it’s learning experiences, or at least were invaluable for me. So definitely Kubernetes The Hard Way, not on GCP. There’s a lot of Udemy courses on Kubernetes. For those of y’all that are more visually focused and like courses. I personally haven’t used them, but I know a lot of my colleagues swear by them. So I would definitely recommend Udemy for that. 

I’m trying to think.  Oh, right. So another thing as far as tools go, I mentioned earlier during this conversation, but kind is by far one of my favourite tools for local Kubernetes development. So I think it’s It’s really great, because it’s Kubernetes running inside of Docker. And it’s really, really lightweight. So if you just want to kick the tires a little bit and start, learning how to write manifests and deploy applications and set up all the crazy stuff that you can set up on Kubernetes, kind is definitely the way to do it. K3s is another way to do it. That’s, that’s a version of Kubernetes maintained by the folks at Rancher. And it makes a couple of changes to the internal architecture of Kubernetes. But it does make it a lot lighter to run. And it’s good for production as well. There are people on Reddit and elsewhere talking about using K38s in prod on bare metal. So that’s something else that you can experiment with. 

And I guess the last thing I’ll talk about is, even though there’s a lot of stuff in the Kubernetes docs and it can be hard to navigate, I still recommend reading them because honestlyfor a project this large and this complicated, the docs are pretty good. And they do a good job of explaining all the core concepts and how they fit in. But one of the things that I use pretty much, I won’t say on the daily but on the regular, is their API references. And if you use the API reference, you’ll learn all about what is a pod and what it can accept and what it can do. You’ll learn like the nitty gritty of it. So that’s it. I mean, that’s everything I recommend.

That’s fantastic. Yeah, I took note of all of that. We’re going to tweet that out as Carlos’  recommendations on Kubernetes when the episode goes up.

Blame me! If those things are not usefu and they’re garbage for you, I’m on Twitter and LinkedIn. Blame me, and we’ll talk about it.

Well, perfect segue. How can people get in touch with you?

Yeah, so I’m on LinkedIn. I have an easy redirect URL. It’s But if yyou think that that has a virus or something, I’m on LinkedIn at Carlos N. HTX. That’s Houston, Texas, Xray. So you can find me on LinkedIn there. I’m also on Twitter at @easiestnameever. That’s literally my username. So you can hit me up there. I also respond to email. I’m old school that way, I guess. So If you’re listening to this and you want to like, throw a pitchfork in it or youthink something’s cool and you want to learn more, email me or send me a message on LinkedIn. Be happy to get in touch.

Awesome. Fantastic. Well, Carlos, thank you so much for spending the time. This has been absolutely great. And it’s been really fun catching up with you.

Yeah, man. The pleasure’s all mine. Thank you for having me on.

Well, that was a lot of fun for me, I hope that you took away something valuable from that conversation with Carlos. And, as I’ve already said, thanks so much to Carlos for taking the time out to talk with us and to lay some wisdom on us. As always, you can rate and review us on your favourite podcast app, or you can tweet us at cloud_unplugged. We also respond to emails at or check us out on YouTube at Cloud Unplugged for episodes, transcripts and bonus content. Something new, we have a Slack community! Check out the episode description for a link to our slack community. Join us there to keep the conversation going. Thanks for listening, we will see you next time.

Transcribed by