Engineering Productivity, GenAI, Metrics, Toil

Boost software development productivity by 50x, so was the message on the expo floor at AWS re:Invent. Hundreds of vendors, from startup to enterprise, had booths at the expo marketing their developer tooling, platforms, and consulting services. GenAI has been in full swing for a couple of years now, and with the recent trends of organisational efficiency, the two have married up. Most booths touted how recent GenAI features they’ve added improves the productivity of software developers. What are you waiting for? Buy their tools.

Engineering Metrics, Productivity

In my day job I lead the Engineering Metrics initiative for a big tech company. We measure aspects of software delivery, reliability, and security across all systems across the company and provide a single pane of glass view for everyone from a developer to the C-level to interpret them. Whether through technology trends or perceived financial constraints, the demand for measuring the productivity of software developer’s has never been higher. Productivity means different things to different people. When I’m asked if my team can measure developer productivity I’ll set up a conversation with the person asking to learn about their context. A few generalised examples below:

A platform team is not progressing on new initiatives. Their perceived productivity is low. The team deals with a lot of interrupt work to support customers using their platform.
A platform team wants to use org-wide developer productivity metrics to show the value in delivering a new version of their software. The new version allows teams to work faster.
A product team is not progressing on new initiatives. Their perceived productivity is low. The codebase they work on is old, hasn’t been well maintained, and is very difficult to deploy.

In almost all situations there is a more objective metric that’s easier to measure and is more accurate than some metric of aggregate productivity. Teams already know why progress is slow or why costs are high. Typically it’s much better to measure aspects of toil, cost, or the things that are preventing a team from achieving subjective productivity than it is to measure productivity of some blackbox metric of units of value delivered divided by cost.

Defining toil, improving, and iterating is nothing new. It’s what I recognise as one of the core tenets of the DevOps philosophy. Breaking down complex work makes it achievable. If we attempt to make ‘units of value delivered’ go up, or ‘cost go down’ then we can blindly stumble into unintentional side effects. Five years ago it was the same developer tooling companies that promised to sell us ‘DevOps tools’ which would improve productivity. We know that DevOps is a capability and culture that needs to be developed within a company, not a tool that can be bought off the shelf and wild productivity improvements realised.

CodeWhisperer at AWS

At re:Invent I attended the talk Boost Developer Productivity with AWS CodeWhisperer. CodeWhisperer is a co-pilot tool which helps engineers write and debug code. The talk described how AWS measures developer productivity, and then how these metrics changed after a portfolio adopted CodeWhisperer. Their developer productivity framework covers a few aspects:

System Health - Observability, reliability, etc.
Software Delivery Health - Delivery speed, frequency.
Team Health and Developer Wellbeing - That annoying survey you get sent every two weeks.

The portfolio adopted CodeWhisperer and they saw strong movement in the Team Health and Developer Wellbeing metrics. Developers enjoyed using the tool, said they felt more engaged with the code-writing aspects of their jobs and found the tool especially valuable in generating test cases. I asked the speakers whether they saw movements in the System Health or Software Delivery Health metrics but it was a non-answer. I assume if there was a strong movement they would have covered it in the talk. By their definition, adoption of CodeWhisperer improved productivity. By other measures of productivity, it wouldn’t.

I’m not surprised that changes weren’t observed in the System Health and Software Delivery Health metrics. The way metrics such as Delivery Lead Time and Service Level Objectives are measured means a range of factors, including those outside a team’s control, can impact them. As GenAI developer tooling evolves we might see it make more of an impact on software delivery.

GenAI Creates Toil

I believe the most significant thing co-pilot and other software development GenAI tools do is to move work developers happily did into the category of toil. Specifically, a subset of software development tasks related to writing code and information retrieval (aka. googling).

At first glance these aren’t toil. Toil is repetitive, manual, and time consuming work – fixing broken pipelines, running data fixes, or having to manually configure a server. Toilsome work prevents us from writing valuable code. But that definition has been eroding away. Large companies invest a lot of money in service templates and golden paths to achieve a couple of outcomes:

Faster zero to one for getting new services into production
Consistency in technologies and configuration of services

Those outcomes are achieved by skilled developers not writing code. I see GenAI tooling as the next step along the continuum. It moves some next segment of code, beyond the initial service template, from the categorisation of value to toil. CodeWhisperer and its competitors are working on features to allow companies to augment their GenAI model with code that the company provides. This will allow the co-pilot to generate code using internal libraries that follow internal coding standards.

It’s a similar story in information retrieval. Developer tooling such as intellisense and code completion bring valuable information to the developer’s eyes when they need it the most. Asking a co-pilot or chat-like interface a question in real time in the text editors is miles ahead from doing a web search for a particular bug in a library, opening five tabs of documentation, github issues, and marketing copy, accepting all of their cookie policies, and spending ten minutes to read, aggregate, and understand the issue.

These tools are best used by experienced software developers who can interpret the suggested code and incorporate it into the systems by applying judgement. A hugely undervalued boon of co-pilots is that they eliminate some instances context switching. They evolve developer tooling by bringing even more code generation and information retrieval into the IDE.

Conclusion

Productivity means whatever you want it to mean. I don’t think it needs to factor into the decision for companies to adopt co-pilot tools. Developers will use the tools they want, and in two years time co-pilots will be enabled in IDEs by default anyway.

Co-pilots are approaching commodity status joining the glowing halls of other ‘50x productivity’ tools such as CI/CD platforms. Like with CI/CD platforms, there is value in companies choosing a blessed co-pilot to invest in. The future value prop for businesses will be co-pilots that can generate code and make inline suggestions in accordance with the organisation’s coding standards and recommended libraries. This will make both new and older codebases alike easier to work in.

Don’t fall into the productivity metrics trap.

Engineering Productivity, GenAI, Metrics, Toil

Link to this section Engineering Metrics, Productivity

Link to this section CodeWhisperer at AWS

Link to this section GenAI Creates Toil

Link to this section Conclusion

Related Posts

Engineering Metrics, Productivity

CodeWhisperer at AWS

GenAI Creates Toil

Conclusion