— Engineering, DevOps, Technology — 11 min read
Digital Transformation is pushing companies to reinvent their processes and the way their technology teams operate: from maturing their Software Development Lifecycle (SDLC) to implement a DevOps culture and moving to the Cloud. These things have helped to simplify the process but not without some challenges. There is a need for more orchestration and moving parts to maintain.
There has been a surge of specialized roles to tackle these challenges like "DevOps" Engineers (That could be more like a DevOps Practitioner or Advocate as DevOps is a culture and not a position) and SREs (Site Reliability Engineers). But there is still a gap between Operations (Infrastructure) and Engineering (Development) that needs to be close beyond CI/CD and Infrastructure automation, something that not only eases the collaboration between Devs and Ops but also lays the foundation on top applications get built. A Platform that abstracts the complexity of managing infrastructure and complex service architectures that also use CI/CD to reduce time to market new features. An authority for things like picking the right languages, tools, and frameworks to help achieve operational efficiency in technology teams.
But what is The Platform?, for me The Platform is this abstraction layer that hides the underlying complexity of operating the software and infrastructure layers, takes care of all the details of handling Infrastructure operations, services orchestration, CI/CD, and monitoring all these components. Of course, with any complex system, the Platform needs to be well documented to serve its purpose and be as transparent as possible with all its users and stakeholders.
So The Platform can be seen as an internal product whos stakeholders are the technology teams that build on top of it the software and applications that power the Business and helps it thrive in this ever-changing technology landscape.
The Platform can be a multi-layered entity where each layer has its responsibilities and clearly defined boundaries, and it could go as follows:
Infrastructure is the deepest layer of The Platform, and one of the most important ones. Infrastructure should be easy to reproduce and auditable, using IaC (Infrastructure as Code) to simplify the management of resources and support Security and Compliance requirements (BCP/DRP, Business Continuity, and Disaster Recovery Plans). The Platform team should carefully decide which technologies and frameworks to use to achieve operational efficiency of the Infrastructure. Automation is critical for the infrastructure layer reducing its operating complexity and maintenance in the long term.
Selecting a cloud provider is another critical aspect of The Platform. Vendor locking is almost inevitable but not necessarily bad, as cloud providers fiercely compete with each other developing new products and improving existing ones to simplify the process of moving to the cloud, benefiting companies that use their products. Going with a vendor or following a multi-cloud approach is another decision that must be taken with care as it has its implications. Staying with one vendor could "condemn" the company to depend on the ability of a cloud vendor to innovate and evolve, but let's be honest every year almost all cloud vendors release a range of new technologies and improvements to their existing offerings. So picking the most competitive vendor in different areas is the best bet. Following a multi-cloud approach is tempting, as it could extend the available technologies to choose from, reduce the risk of downtimes by having redundancy across multiple vendors, but could also result in much higher costs not only in terms of money but operation and people.
Services orchestration is another vital component of the Platform and one that has recently flourished with the advances in Container and Serverless technologies. Picking the right technology is crucial for the infrastructure layer, not only for things like maintenance, scalability, and costs but also for how easily these technologies interact with the other layers. Container orchestrators like Kubernetes and Amazon's ECS simplify the deployment of applications to the cloud. And technologies like Service Meshes, that provide features like service discovery, load balancing, authentication, authorization, etc., can be powerful tools for implementing a microservice architecture in the cloud.
The cloud and technologies like containerization and service orchestration introduce new paradigms, requiring companies to re-think their Software Development Lifecycles (SDLC). There have been some efforts like The Twelve-Factor App methodology to tackle these challenges and help developers and software architects design software that is suited to this new landscape. Having good development practices like Domain-Driven Design and Test-Driven Development can be critical for creating a rock-solid modern software architecture that can endure the pass of time, but that can also evolve and adapt to what the future can bring.
The Platform team could work as an internal consultant and authority of the Technology teams in terms of aiding them in making the right decisions on picking up the technologies they use to develop their software. These go from programming languages, libraries, and frameworks, but also aiding with developing reference implementations and boilerplates that help the developers focus on the business logic and not on how to implement non-functional requirements over and over again (Things like cache logic, error handling, logging, etc.). The platform team could help build core libraries that aim to reuse code and abstract functionality to deliver new products more efficiently.
The Platform team is responsible for aiding the Technology teams on evaluating the right systems used to implement their solutions and also help them take into consideration all the necessary aspects to make their solutions scalable, maintainable, and observable. These include selecting the right: Database, Streaming Platform, NoSQL Database, etc. Some of these systems will be part of the infrastructure, so making them Infrastructure As Code is a must.
DevOps is a culture, and the Platform team should transmit it to the technology teams, aiming to achieve operational efficiency in the whole Software Development Lifecycle and reducing time to market. The Platform team needs to care for how code gets written, how its deployed, and the immediate feedback developers can have on how their applications run in production environments. The Platform team will be responsible for maintaining the Continuous Improvement layer that will provide the Technology teams the necessary tools to implement every step of a mature SDLC, abstracting the complexity of interacting with this layer by using APIs, ChatOps bots or other means.
Continous Integration and Delivery is one of the pillars of the DevOps culture, simplifying the release of new products and features, reducing the time to market and allowing a business to be competitive. The Platform team should select the right tools to simplify the deployment of applications to the infrastructure, automating and integrating these tools to provide an abstraction layer that developers can easily use to deploy their applications with minimal intervention (or no intervention at all) of the Platform team.
Continuous Deployment can be supported by the platform, but requires a mature and refined DevOps culture, because of its nature and flexibility, and also requires strong support from automated testing.
The Platform team should be able to provide the mechanisms to test the software the technology teams are developing during the whole SDLC. For example, Unit and Integration testing during the Continuous Integration step, Smoke and Regression testing during the Continuous Delivery step, and periodically run tests that the AQA (Automated Quality Assurance) teams create to test that applications are behaving as expected in production environments.
The incredible pace the technology advances these days has also come with the challenges of how we secure and protect our critical systems. The term DevSecOps has become popular and focuses on bringing the DevOps culture to the security landscape, to efficiently implement security best practices into the SDLC. So integrating things like static code analysis, vulnerability scanning, and penetration testing into the CI/CD pipelines has become necessary to protect our most valued assets. DevSecOps should collaborate with the Platform team to make security a critical part of the Platform.
Obvservability in every layer of the Platform is vital to understand how it is performing, the Platform team should implement the mechanisms to provide feedback to the technology teams on how their applications and systems are behaving in Production. The platform team will pick the right tools to monitor infrastructure, manage logs, monitor application performance metrics (APM), and uptime monitoring. These tools will help the technology teams have a deep understanding of how their applications are behaving and quickly detect problems and reduce the time it takes to remediate them and also use the appropriate channels to keep everyone in the loop.
The Platform team shouldn’t work as a Silo, instead they should be transparent and clear on how the whole Platform operates. So knowledge transfer is another responsibility of the Platform Team, documenting every layer of the Platform, performing knowledge transfer sessions with the technology teams to explain and discuss how the different components of the platform work. Code reviews and software architecture review assessments could be performed by the Platform team, to make sure technology teams meet software quality requirements to make efficient use of the Platform.
We've been talking about The Platform team, who is the platform team, what kind of roles belong to this team?. The platform team at the end of the day is a product team whose clients, users, and stakeholders are the technology teams that use the Platform itself, to release new products and features for the end-users. It needs to be composed of a set of individuals whose seniority and expertise will let them be able to lead the design, implementation, and maintenance of a platform that will work as a foundation for all the products developed by the technology teams.
My vision of a Platform Team would be something along these lines:
Product Owner, a Technical Lead or Principal Engineer who is capable of understanding the vision of the Platform, and guide the team on the mission of building it. This person should be able to work together with the technology teams to make the best use of the Platform and to serve the business needs. This person will create the roadmap of how to build the Platform, and the deliverables of each phase of its development, so the rest of the teams can start using it and provide feedback, to continuously improve it. This team will collaborate with all the Technology teams, so the number of requirements will be high. The Product Owner will need to be able to prioritize the development of the different parts of the Platform to satisfy the most urgent needs, to reduce the time to market for the other team's products delivered through the Platform, even when the Platform is still in development.
DevOps Advocate, this role is for someone that fully understands the Software Development Lifecycle and helps the Product Owner with selecting the right tools to implement a DevOps culture. This person will be the owner of the Infrastructure and Continuous Improvement layers of the Platform and its implementation. This person will be the DevOps evangelist and the responsible permeating this culture in the technology teams. Infrastructure creation falls into this role, so using tools like Infrastructure as Code and Configuration Management is critical to simplify infrastructure management. This person will also collaborate with the Security Team to successfully implement security best practices into the Platform.
Automation Engineer, this person will be in charge of automating and integrating most of the parts of the Platform, to make them work together, reducing the maintenance complexity. This person will also be in charge of creating the necessary abstraction layer to expose the Platform as a service to all the technology teams to make the best use of it, through API’s, ChatOps bots, etc.
Software Architect, this person will work as an internal consultant with the technology teams helping them in the design and implementation of the software they are developing. He/She will also develop tools, libraries, and boilerplates to help developers focus only on the business logic of the software they are creating, reusing code and functionality as much as possible. This person will be the Platform evangelist with the development teams, taking requirements, and shaping them to be seemly built on top of the Platform. The Software Architecture layer will be his/her responsibility.
Project Manager, this person will help the team managing the projects that the Product Owner defines using the best suitable methodology. Almost all projects of the Platform Team are purely technical and are invisible to the business. Some of them will vary in length and complexity, so the Project Manager should be capable of deciding which methodology to use to keep track of the status of each project and manage the delivery timelines for each of them, always prioritizing projects that are critical for the business.
Like any system, its value relies on its capability to evolve and adapt as the business grows and changes. Continuous improvement should be the core value of the Platform and practiced by the Platform Team.
Every year the Platform Team should perform a series of meetings to review the State of The Platform to discuss the vision of the Platform and how it's coming to reality. As with any iterative process, the goals can change and adapt to the way the company operates. After completing the review, the Platform Team will have a plan on how to continue evolving the Platform, the necessary changes, and features to implement. The next step for the Platform Team should be to separate the plan in projects that are measurable in terms of time, cost, and resources. Discuss them with the Technology teams as some of these projects will be a result of the feedback they give to the Platform Team, so things like timelines, priorities, and requirements can be defined.
Each quarter the Platform Team should meet and perform a general review of the current status of the Platform and review what it's pending, what can be improved, and what it's completed. Based on this, the Platform Team will define the planning for the quarter.
So building an Engineering Platform and a Team that supports it is not a new idea, I borrowed some ideas from several articles on the subject (Check the References section at the end of this article for some great articles on the subject). My curiosity around this came on a discussion with the Director of Engineering, in the company I work for, when we were discussing doing a re-architecture of our backend services and how developing an underlying "Platform" would support that project. I've been implementing these things in the last five years, so I understand the value and importance of having this kind of team in any organization. So this article is my take on what could look like an Engineering Platform and the Team that could make it possible, based on my own experience working with some of these technologies and being a DevOps practitioner.
I know this idea might not be perfect, some people will differ with some things, even might find that some things might or might not work. But as a practitioner of Continuous Improvement, I believe its a matter of taking the feedback you get from doing this kind of re-engineering process, refine the idea and come with a better approach. There is no silver bullet to tackle challenges imposed by the competitive world in which organizations must keep pace and evolve to remain relevant, so re-thinking their processes and find the sweet spot on how to achieve competitive advantage is critical.
I am a true believer that building an Engineering Platform could benefit any company from a startup to a pre-cloud era organization planning their move to the cloud. An Engineering Platform could help not only Product Teams but also other technology teams like Data Engineering and Data Science teams, so DevOps it's not exclusive for traditional Development teams. Concepts like MLOps, DataOps, and AIOps have been gaining traction showing that there is a need for more tight collaboration between technology teams and a Platform Team that helps them be more effective in delivering solutions that aid any company flourish. So leveraging the power of a Platform could benefit any technology team as a robust enabler for Digital Transformation.