One of my goals for 2019 was to improve my skills and knowledge around Product Delivery. What is Product Delivery? I’d define it as a set of practices, attitudes, and tools which can be used to improve the quality of software and the speed at which it’s delivered to customers. It’s a large field which takes in areas of Lean Product Development, Agile, Software Development, DevOps, and SRE. My approach was to read a lot of books and blogs on these topics and discuss the ideas with colleagues around the office. I’ve written this from the point of view as a developer on a product team at a large tech company. A lot of these ideas are applicable to solo entrepreneurs and those working on side projects.
Accelerate describes Elysium for Software Companies, what world class engineering looks like. The authors list these four key software delivery metrics:
- Lead Time - Time taken between when a feature is agreed upon until it is running in production. Keeping this short allows more features to be developed, as well as short turn around time on bugs.
- Deployment Frequency - How frequently software is deployed.
- Mean Time to Restore (MTTR) - After a failure in production how quickly service is restored.
- Change Fail Percentage - Percentage of deployments to production which fail.
Other metrics are described throughout the book and a picture is built up of a high performing company. Companies are also described in terms of their technical capabilities and practices. Some examples of operational capabilities are continuous deployment of code starting in low traffic regions using a blue/green deployment strategy, or a monitoring system which can automatically cancel and rollback the deployment if error rates spike. Essentially anything to the far right of this agile fluency model. We all want to reach these mythical place, and no doubt some of your systems are there already. But for those which aren’t, how do we actually get there and what changes can you start making in your team today?
The DevOps Handbook puts some attainable goals on the panacea described by Accelerate. The book looks at the core philosophy of DevOps and Lean - creating systems with fast feedback loops. It takes this concept as a seed and explores it in the different domains of software delivery including agile practices, development techniques, code architecture, source control practices, continuous integration, testing, deployment, and monitoring. There are plenty of interesting interviews, case studies, and stories from Google and Etsy about how they used DevOps practices to better their process.
Whereas Accelerate describes these software delivery practices in the context of a company, The DevOps Handbook describes these practices in the context of the team. This made it much easier to draw links between what was discussed in the book and my actual day to day work. This was a great book in terms of actionable items and gave me far too much to think about across all areas of software delivery. Given there are so many improvements a developer can make in their team, what should they prioritise and where should they start?
Extreme Programming Explained [the XP book] by Kent Beck isn’t like these other 4 books in that it was published in 2004 (first published 1999) and predates the word DevOps. Beck is obsessive about optimising for customer value and sustainable development. He puts aside the cargo cult adoption of new shiny tools ‘for the sake of it’ and puts emphasis on only doing things which improve the customer experience. It’s not always about the tools that we use, but it is always about the value we deliver to our 2,000,000 customers. Framing the picture this way gave me a clearer idea of what tools and concepts to adopt from Accelerate and The DevOps Handbook. If you’re a Developer and haven’t heard of Extreme Programming then I recommend picking up this book. Beck describes a very opinionated way to do software development. You don’t need to agree with it all, or adopt it, but it made me reflect on how I approach software development.
The Phoenix Project is a fictional tale which looks at a company struggling in an effective work process, bureaucracy, and bad communication. It’s a really easy read and doesn’t get down into any technical detail. I’d recommend it to anyone who’s looking to understand the why of DevOps and Agile.
DevOps for Finance provides a rebuttal to the phrase “we’d like to use DevOps but we operate in a regulated industry” - a sentence I said verbatim to a colleague when describing my last job. A couple of weeks later he recommended I read this. The book takes examples from banks and trading firms, an area where continuous deployment can’t really be practiced. My take away from this book was to look at the regulation, the requirements, about exactly what you can and can’t do. Software running stock exchanges can’t be re-deployed in the middle of a day of trading, OK. But the same firm could invest heavily in an automated testing infrastructure and a one click deployment which will make deployment, when they happen, so much easier. Identify some areas that you can improve and invest heavily.
What have I learned?
Delivering value to the customer is the team’s priority. Obsess over it. It should influence every decision you make with regards to team structure, way of working (agile), tech stack, system architecture, code quality, testing, delivery practices, and operations tools. If your team is making a decision which affects the product and no one has talked about how that decision affects the customer, then bring it up.
Any bit of pain you encounter between when a feature begins development and the feature is in front of customers will amplify. Address it as soon as you can. If your local development environment is difficult to set up, if there are flakey tests in a CI environment, if deployments often fail because of timeout issues, if production deployments take 3 hours, if when you look at production logs and have to interpret what is happening between an acceptable baseline of errors then stop and do something about it. When you ignore the feedback you’re fighting the system. Fixing the system improves your team health through reduced stress and fear, and the time that was being spent fighting the system can be used to deliver value to customers or browsing Hacker News. By sticking to the status quo you’re inviting the same failures and pain that you’re currently experiencing.
De-couple deployments from releases by using feature flags. This gives a couple of big benefits:
- You get continuous deployment
- The feature gets released in production and starts causing errors. Is it quicker to do another deployment or just flip a feature flag to disable the feature?
Low risk deployments - Developers should feel 100% confident in doing a code deployment. Feedback loops feature big time in this. If CI tests are flakey or it takes a long time to deploy then the incentive for the developer is to batch up as many changes as possible. This just leads to fewer, high risk deployments.
Sustainable development - Teams lacking motivation will produce software lacking quality. Look after yourself and each other. Balance the challenges of work with opportunities in life. Set goals. Reflect on team health in retrospectives - it can be anonymous. If something is off then take corrective action.
It’s up to you - You can make effective change in your team today. Chat with your team to get alignment on the parts you are trying to change. You don’t need permission from your manager, or their manager.
Cross functional teams enable Humans, not Heroes - In a hero culture only a handful of people know how particular systems work and are able to fix them. These systems may be opaque to others due to their complexity and/or lack of documentation. In a cross functional team every member knows how the entire system work. This is encourages a virtuous loop of more pair and mob programming, sharing knowledge, and ensuring any new work is implemented in an understandable and well documented way.
Monitoring matters - If you can’t see how your app is running in production then you’re blind. If you run a popular product then you could just monitor Twitter to see if there’s any downtime or bugs, but there are so many better tools at our fingertips. New Relic Browser for client side performance monitoring. New Relic Synthetics for client side availability. Hook these up to alerting. Have an arsenal of dashboards, graphs, and log search terms at hand ready to check in the case of something going wrong.
Some thought experiments
I’ve taken these as down as notes throughout the year. I think they’re somewhat original, but almost certainly influenced by the material I’ve read and people I’ve talked to.
Teams should have a measurable goals and structure how they work to achieve them. For example a product development team might want to assess how effectively they can deliver value to the customer so they might measure:
- Speed of delivery of new features to the customer (aka Accelerate Lead Time)
- Defect rate
The first looks at delivery speed, measured from when a ticket is picked up from the backlog until it is deployed in front of customers. The second are defects, no matter how minor. Most defects reduce the customer experience, and all defects require your team to go back and fix them. A lower defect rate requires less re-work so features should be able to be shipped faster. These metrics don’t measure if you’re ‘building the wrong thing’, only your delivery speed and ability to pivot. Pick whatever goals you want, the key is that you can measure them and reflect on them in a team retrospective.
The two certainties of software development:
- There will be new requirements
- There will be defects in the software
Design system architecture and write code with this in mind. (Stolen from Kent Beck but worth repeating).
Defects as Trust - View the defect rate within your software as a lever for team trust. Most of us have been through severe production incidents at one job or another. It makes you feel like shit, puts the team in a state of pressured unease, and makes the team and stakeholders more risk averse than they were the day before. My hypothesis is that low/no severity defects still cause distrust within the team. Even if there is no damage to the customer, individuals in the team question their own skills, and may come across more risk averse which affects normal operations of an otherwise functioning team. This is another reason to track the defect rate and actively try to minimise it.
Start Improving Somewhere. Unsure where to improve quality in your code delivery process? Imagine that starting tomorrow your project will be continuously deployed. What things in your delivery pipeline or development process do you need to improve to allow this to be implemented? Are there flakey or slow running tests which need to be fixed? Is there enough monitoring in place that you’d be confident for a deployment to happen at any time? Are big changes typically hidden inside feature flags? These are all areas you can jump in and start improving the delivery quality of your product.
For something as lucrative as product delivery I find it funny that there are no prospective studies which test different development and delivery techniques (can someone link me to any?). Perhaps Google and Amazon have internal studies which they don’t publish, but in the public domain all we have is Accelerate which is a retrospective look at ‘high performing companies’. Accelerate is very good, but it leaves the rest of us in a position in which we can only copy and adopt the software delivery metrics adopted in Accelerate without knowing the why. From what I can tell a given theory becomes accepted by the community when a company known for engineering and product success (Amazon, Netflix) publishes some glimpse of how they work. Books and blog posts get written, and then these practices gradually become a standard way of doing things. I haven’t gone anywhere near the business/MBA type of readings so I’m likely missing something, but I’d assume that any good studies from those would be widely shared in the DevOps community.
Learning what I have, so far, has shown me that there are a huge number of opportunities where one can drive improvement. Read books, read blogs, talk to people from other teams, participate in communities of practice. Learn, observe, and share. Apply these new skills and lead growth of product delivery techniques in your team and company.