I recently read Accelerate - Building and Scaling High Performing Technology Organizations. Here I give a brief overview of the book and share some thoughts I had while reading it.
The authors are looking to answer questions such as “Do technical practices and automation impact software delivery?” and “Is a Lean approach to product management an important aspect of software development and delivery?”. They’ve done this through several rounds of surveys to people who know what Devops is, but don’t necessarily have to be following it. They got 23,000 responses from around 2000 different organisations sized between 5 and 10,000 employees. The survey asked Likert-type questions, where the respondent chooses from a range of options, ie. ‘Strongly disagree’ through ‘Strongly agree’. An example of a question in the survey is “How often their organisation deploys code for the primary service or application they work on” and were offered a choice of six options ranging from “On demand (multiple deploys per day)” through to “Fewer than once every six months”. It’s important to remember that this data was sourced from a closed question survey - so it’s in no way a definitive list, nor necessarily causal. It is a list of practices which high performing companies happen to follow.
Measuring Software Delivery Performance
The authors came up with these four metrics which could be used to measure the performance of software delivery:
- Lead Time - Time taken between when a feature is agreed upon until it is running in production. Keeping this short allows more features to be developed, as well as short turn around time on bugs.
- Deployment Frequency - How frequently is software deployed?
- Mean Time to Restore (MTTR) - After a failure in production how quickly is service restored?
- Change Fail Percentage - What percentage of deployments to production fail?
Lead Time and Deployment Frequency measure the velocity of delivery whereas MTTR and Change Fail Percentage measure the stability of a delivery. This is an agreeable definition of the term Software Delivery and achieves a nice balance. All four measures fit in the Lean approach discussed in Eric Ries' book The Lean Startup (another worthy read). Businesses can’t run experiments (think A/B tests) if they can’t quickly and reliably deliver software. Speed and stability are not a trade-off:
[Survey results] demonstrate that there is no trade-off between improving performance and achieving higher levels of stability and quality. Rather, high performers do better at all of these measures. This is precisely what the Agile and Lean movements predict, but much dogma still rests on the false assumption that moving faster means trading off against other performance goals, rather than enabling and reinforcing them.
Capabilities to drive improvement
Freshly copied from Appendix A, these are capabilities that the authors noted high performing companies execute to a high level.
Continuous Delivery Capabilities
- Use version control for all production code and configuration.
- Automate your deployment process.
- Implement Continuous Integration.
- Use trunk based development methods - keeping feature branches few and short lived.
- Implement automated tests.
- Support test data management - integration tests often need test data to run. This needs to be properly managed.
- Shift left on security - involve security early in the development process.
- Implement Continuous Delivery.
- Use a loosely couple architecture - allowing deployments to occur independent of other teams and services.
- Architect for empowered teams - Let teams pick their own CI and CD tooling.
Product and Process Capabilities
- Gather and implement customer feedback.
- Make the flow of work visible through the value stream - teams should know why it is they’re working on a particular feature.
- Work in small batches.
- Foster and enable team experimentation.
Lean Management and Monitoring Capabilities
- Have a lightweight change approval process - peer review is appropriate.
- Application and infrastructure monitoring should inform business decisions.
- Check system health proactively.
- Implement ‘Work in Process’ limits - reduces task switching overhead.
- Visualize work to monitor quality and communicate throughout the team - Dashboards, dashboards, dashboards.
- Support a generative culture - as outlined by Westrum.
- Encourage and support learning - have a learning culture, invest in the employees.
- Support and facilitate collaboration among teams - prevent siloing.
- Provide resources and tools that make work meaningful.
- Support or embody transformational leadership - enabling these other 23 things starts at the top.
On the face of it these all seem sensible. CI/CD? This is what we’ve been reading about on Hacker News the last 6 years. Regularly gathering customer feedback? Startup 101. Supportive and learning culture? Yeah of course, no one wants the opposite. So is this list helpful at all? I think it is if the people within the team aren’t the ones assessing themselves on their progress at it. Some of these items such as ‘automated deployment’ are just giant. I suspect the majority of developers within large companies say they have an automated deployment pipeline. It’s just that some also have manual deployment steps in and around the automated ones.
When you’re on a team you quickly get normalised to the way things are done. The automated deployment pipeline is good enough, tests cover almost all critical parts of the application, we can usually focus on an item of work and not be constantly context switching. But your normal drifts. Two years later you’ve had to add a few more manual steps to the pipeline, the developer who wrote a new feature left before they wrote tests, and you’re juggling varying types of work. But it still feels normal and you’re still relatively happy with it. At my last job I would have said we had an automated deployment pipeline. Jenkins would deploy the app, secrets, and server configs to production. Sure you had to manage the infrastructure, write the post release documentation, and do a few other manual steps but it was all automated… right? When I joined my new team and saw how they used Jenkins the deployment pipeline at my old job suddenly looked like a Reliant Robin. Sure I’d been reading about deployment pipelines for years on Hacker News but in retrospect it must have been with the lens that what we had implemented was good and normal. Maybe what we set up was good at the time and had just aged poorly, but it took a new point of view to see how wrong I was. I had the same eye opening feeling happened at a Devops gathering a few weeks ago. A colleague was giving a demo of the Jenkins and Octopus pipeline he had been working on. The features, particularly around automated post-release documentation and notifications were amazing. I suspect that unless an implementation is causing a team pain (ie. they are performing the capability poorly) then they’re unlikely to critically assess it for potential improvement.
So how does a team get around the trap of normal? How do they really know if they’re executing any of those capabilities at a high level or just using them in a middling sort of way? One way would be assessment, for experts in these domains to evaluate how teams are doing, but by that point teams have already put a lot of work into an apparent substandard implementation, this may cause tension. I don’t like this idea. Standardising these common implementations (CI, CD, architecture, test data) into templates is essential, but is only of immediate value to new teams. Templates would only cover some of the technical capabilities, they wouldn’t help a team develop a learning or sharing culture. Project owners, people leads, and agile team facilitators are all oriented around the process, lean management, and cultural capabilities of the team. I don’t know enough around this area to speculate so won’t offend those wonderful people by doing so :-) . One of the core messages of the book is that it’s a learning and continual development process. A team should never be in a situation where they say ‘we have automated deployments’ or ‘we encourage and support learning’ and stop paying attention to it. It should always be thinking about how it can improve the processes it has implemented. It’s not a destination, it’s a …
In this post I’ve focused on technical capabilities but it’s important to note that these are enabled by the capabilities in the realms of process, team structure, and management. All of these are discussed in detail in the book. Accelerate is very easy to read and I highly recommend it. Finally, a few quotes from the book:
On culture, something we should all remember, from Google, 2015.
Who is on a team matters less than how the team members interact, structure their work, and view their contributions
Failure Demand exists for software delivery as well.
Seddon invented the concept when he discovered that the movement of ‘telephone work’ from local bank branches to call centres in the 1980s caused an explosion in the volumes of demand – the number of phone calls soared. He found that the rise in call volumes was attributable to the creation of ‘failure demand’, i.e. customers calling a second time because the bank had failed to solve their problem on the first call.
On burnout and unhappiness at work.
Maslach found that most organisations try to fix the person and ignore the work environment, even though her research shows that fixing the environment has a high likelihood of success