BLOGPlatform Engineering

Is Being a Platform Manager the Most Stressful Job in the World?!

Category
Platform Engineering
Time to read
Published
February 22, 2024
Author

Key Takeaways

Understanding the roles of Workload Identities, Cluster Service Accounts, IAM Policies, and IAM Roles in managing access controls within AWS environments.

Exploring real-world use cases to illustrate the importance of effective IAM policy management in securing multi-tenant environments and aligning access controls with business requirements.

Comparing manual IAM policy management with streamlined approaches, such as Wayfinder's Package Workload Identities, to highlight the benefits of automation and centralised policy management.

5 tips to help you manage more with less

Not every manager of infrastructure or platforms has the luxury of budget, time and people to be the innovators in the business, regardless of how aspirationally forward thinking we are.

We spend a long time outwardly looking at the industry, speaking to peers, vendors and partners to understand the advances in technology and ways of working that we know will help us be successful in delivering a best in class service to our business and IT delivery. The theory is well understood but in reality we have so many constraints working against us to be able to implement some of the practices in our organisation.

Even if you have a limitless budget, you will still often struggle with time, people, skills and process, amongst other things. Here’s a few tools and tips that I have come across from many years of helping managers of infrastructure and platforms to understand where to invest their time.

Below is a list of areas to focus on, based on years of experience of helping organisations transform their software delivery:

1. Identifying where to invest time and budget

Outside of managing business as usual tasks of keeping the business running, the job of infrastructure and platform engineering teams is to reduce the blockers and wait time in getting platforms, tools and infrastructure into the hands of product teams and to help deliver applications with as little friction as possible.

We want to embrace a devops culture and way of working to make sure that our software delivery process is unhindered by technology and that our engineering efforts are hand in hand with the product delivery team’s agile processes.

But with limited resources, how best can we make an impact on delivering software? One of the tools from the agile practices toolbox I have used is Metrics Based Process Mapping, a practice that helps identify wait time in an end to end process. The practice is not limited to your team but must encompass the end to end of software delivery. The handover points in processes involving your platforms and your delivery teams can be analysed to find you might invest in automaton to reduce friction and wait time.

Join forces with DevOps

Does your devops team also measure their processes? If so then they should be using industry standard measurements such as the Devops Research Assessment (DORA) software delivery & operational performance metrics. Ensure that your team’s processes and metrics are collected and included when measuring specifics such as:

  • Lead time for changes
  • Deployment Frequency
  • Time to restore service
  • Change failure rate

2. Define the interface boundaries of the platforms and the devops teams

Manual touch points into the infrastructure and platforms can be expensive and a blocker in speed to delivery. This might take the form of service tickets for your teams to provision cloud infrastructure or even a backlog item in an agile product delivery that is allocated to your team to do some manual work. There are key areas that can be identified immediately or through practices as outlined above.

A tactical fix might involve identifying each of these manual touchpoints and creating a roadmap for implementing interfaces into your processes for others to automate against. Use specific automation tools such as Terraform or Ansible to create automation interfaces as scripts within your product team pipelines or as secured processes with Ansible Automation Platform that can have multiple interfaces into running processes.

Be wary of shifting toil

I call this tactical, because you must be aware of shifting the work your team is doing from manual processes to maintaining a growing amount of scripts, APIs and tools. Be wary of building technical debt with a growing engineering maintenance problem.

3. Reduce custom engineering

Investing time in automation gives us big wins, the impact on automating is to give reliable, repeatable and controlled infrastructure and platforms. However, there will come a time where our engineering efforts create bigger problems than they solve.

A great example is where we built Kubernetes platforms to speed up the delivery of business software through container workloads because it was the best way to do it at the time. Today, we wouldn’t think of building our own Kubernetes platform, we would use public cloud Kubernetes like Microsoft's Azure Kubernetes Service. Businesses realised that they were not in the business of engineering platforms but delivering business outcomes from technology.

Be aware of hidden engineering costs

Choosing a public cloud or other Kubernetes distribution however does not solve all of your platform engineering problems. Even with public cloud Kubernetes, you will still need to create scripts and processes to offer the service at scale to your teams. Taking the standard cloud services and turning into a scalable, secure, repeatable offering to multiple product teams across multiple environments is expensive.

However, tools like Appvia Wayfinder were created to solve this exact problem By shifting the automation left to become more developer self-service with baked-in security best practices, enabling teams to move faster and more time to be spent on the harder problems such as observability, metrics, security awareness etc. Choose tools that are simple to maintain and supported by a vendor, be aware of choosing tools that require specialist knowledge and skills that take people away from delivering higher value functions.

4. Retain skills through building an engineering culture

You will undoubtedly already know that skilled, experienced people in our industry are hard to find and when we do find them they are very expensive and hard to retain. Increasingly I have seen organisations turn their recruitment policy from hiring the best talent they can find for every role, to hiring talent potential to grow into the role we need. OK, hiring for potential has always been a best practice, but recently we have moved to hiring very junior people to fill much more senior positions. This can work but needs careful consideration and investment on career progression and mentoring.

There are amazing young engineers who are just starting their journey who we can turn into amazing senior engineers. However, you must retain your senior engineers and create an engineering culture which embraces learning and experimentation.

Commodity engineering is dull!  

Be aware that having your engineers work on thankless tasks, building and maintaining what should be commodity services is not going to keep or grow your talented people. Buy in the commodity tools and services to free up time for higher value engineering and growing an engineering culture.

5. Aim for everything as a service

If you have got this far into this blog then you will have seen that the focus is on reducing friction on the services you deliver and removing bottlenecks. You should strive for providing these services to your software delivery teams as self service functions in a safe, secure and repeatable way.

Secure the services

You must have security at the forefront when building self service, it’s often the last thing that is considered and the most complicated thing to implement. Deciding how to manage secure access, people, teams, and services is not trivial. Consider tools such as Ansible Tower to provide security and auditable execution of scripts, or tools like Appvia Wayfinder that were built specifically with secure self service for scaling public cloud Kubernetes.

Audit the services

Auditing is not just for regulated industries, it’s part of your feedback loop in diagnosing activities and behaviour. Make sure that your self service delivery solution provides an audit component that records the self service interactions, you will quickly realise that this is essential when things go wrong if it’s not there.

Manage the services

Lastly you must provide ways to manage the services that you offer. Thought must be given in how to observe, scale, secure, manage APIs, provide GUIs, manage certificates, patch, deprecate and evolve the self service capabilities that you are offering. If anything has resonated in this blog, then it must be that this looks like a huge engineering effort, everything that I have warned against! You would be right in thinking this, so choose to do this carefully! Seek out tools in the industry that are going to provide self service without the engineering effort.

Everything I have talked about is at the heart of everything we do at Appvia, we have built, managed and advised customers like the Bank of England and the Home Office to help reduce the effort in maintaining platforms and processes that remove the blockers and bottlenecks in delivering secure and reliable application services to the business and their customers. Feel free to reach out to us if you would like to find out how we can help.  

Related Posts

Related Resources