Cargo Culting Software Engineering Practices

April 16 2022 · tech software-engineering

Kent Beck is one of my software engineering role models. Three months into my first Big Tech gig I was talking to a mentor about the overhead involved with agile. Meetings for standups, tech huddles, refinements, retros and product reviews. Processes such as limiting work in progress, desk checks, spikes, stories, and tasks. Coming from a startup we delivered high quality software without all of that labelled process. A mentor lent me his copy of Extreme Programming by Kent Beck, and that book gave me a framework to develop my own philosophy about how to practice software delivery in an agile context. Since then I’ve followed Beck’s writings and talks.

I recently came across an interview with Beck from 2019 where he talked about his time at Facebook. He joined Facebook in 2011 when the organisation was 700 engineers, and left in 2018. The interview goes into a lot of technical detail on interesting scaling problems organisations like Facebook encounter. What interested me most, and is the topic of this blog, is around engineering practices. Things such as agile ways of working, test strategies, design patterns, and feedback loops.

It’s 2011 and Beck, already a software engineering luminary, joins Facebook. He expects it to follow common and uncontroversial software engineering practices, perhaps even some of those he writes about. Instead:

It’s crazy, it looks like a clown show and yet, things are working really well. They weren’t doing the things in my books. I like to joke, I don’t mind if people don’t do this stuff in my books. I just want them to fail. They weren’t doing that. I thought, well – My first thought is I’ll come in and explain how this stuff works.

In the back of my mind, there is this mystery of this bumble bee. In theory, this process should be a disaster and in practice, it’s working extremely well at two things at the same time; at scaling and at exploration. I wanted to figure that out.

During a hackathon he runs a class on Test-driven Development.

Nobody’s using TDD, so well, of course, they’ll want to learn from me … The class just before mine on the list of classes was about advanced techniques in Excel and it was full and there was a waitlist. The class just after mine was on Argentinian tango and it was full and there was a waitlist. I had zero people sign up for my class. I thought, “How am I going to have any impact here, if people don’t listen to me?”

Starting writing code:

In boot camp, you’re supposed to put code into production the first week. I was very careful to write tests and do everything properly. I got in a fair amount of heat, because my first feature didn’t land for three weeks. People are like, “Man, I don’t know how this is going to work out.” Well, and I was wondering that too. I had a huge case of imposter syndrome when I landed at Facebook and realized just how different everything was.

Then the tests that I had written broke almost immediately. They were deleted. That was one of the things that surprised me. If you had a test and it failed, but the site was up, they just delete the test. If you had tests that were intermittent, that were non-deterministic, they were just deleted. At first, I was shocked. Like, delete a test. This is producing noise and it’s not producing signal. If you eliminate this noise production, per definition the situation is clearer all of a sudden. The fact that you wish that you had a test for something, well you didn’t. Yeah, just chuck it and let’s move on.

When in Rome..

I deliberately chose to forget everything I knew about software engineering. I just said, “I’m going to try and be a programmer and I’m going to watch what people do. I’m just going to copy what they do. If somebody says this is two diffs instead of one, it will be two diffs. If somebody says you need tests for this, I’ll write tests. If they say you don’t need tests for that. Why are you writing tests? Then I won’t write tests, even if I think that’s my – that’s the natural thing to do.

That’s the only way that: One, that I was going to be able to explore this mystery of how this software engineering process worked. Two, this is the only way that I was going to have any influence, because clearly, nobody was going to listen to me based on reputation.

What engineering practices can other companies and startups learn from Facebook?

Nothing. People should figure out what their style is and do their style. I’ve been talking about software process for a long, long time. Something I notice is there are people who are uncomfortable taking responsibility. They want a process where they can say, “Well, hey, we executed the process. We failed, but we executed the process.”

I think losing that and realizing that there’s no such thing as a technical success, that you’re all in it together and that your process is your process and you should play with it, you should experiment with it, you should try out a bunch of ideas. In the end, it needs to be yours. I think that’s the real lesson. Facebook did that. They did things that weren’t conventional, not because they were unconventional, but because it made sense in the Facebook context.

Everybody should be doing that and not copying – Spotify is the flavor of the month. Well, let’s copy the Spotify model. Well, Spotify didn’t copy the Spotify model, so what makes you think copying is the right thing to do?

So how should one look at engineering practices then?

You can only be fairly certain of the things you’ve tried yourself in your context, and only as – and those decisions are like fish. A month later, you should definitely question their value. If you let go of this low, what should our process be and say, what should our process be for refining our process, if you let go of the need for an answer and you embrace answering as a continuous process, then I don’t think you can do better than that.

Just because there isn’t one way that works, there are a bunch of ways that work really badly. There are a few ways that work well, and I think of them as attractors in the space of process. There’s a few ways that work well and there are a whole bunch of ways that work horribly. The first thing to do is identify where you’re doing something that’s horrible and stop doing that.

We can write useful stories about software development process, but they’re stories, they’re not recipes. The person listening to the story is going to have to take it in and digest it and apply it in their own specific context, because every single day at every single company is a different context. Of course, the answer is going to be different. The inputs are different.

Thoughts on Teams

I extend Beck’s last comment and say that every single team at every single company is in a different context. Teams are made up of people of different experience, age, skill set, expectations, and the hours they work. Code bases teams work on vary in age, size, complexity, domain knowledge required to develop it, number of other teams who also work on it, how the code is deployed, and the performance and reliability required of a service. Teams should be refining their engineering practices and software delivery methodology to fit what they’re working on at present time. Absolutely start with a cookie cutter approach, some flavour of Kanban or whatever is trending on twitter, but the team should always be tweaking it. Retros are great for this.

One way I’ve seen companies prevent teams having autonomy over their software engineering practices is through job titles - both formal and not. SCRUM Master is an obvious one. Having a Software Tester, or being required to have people on-call also has an effect. Testing is a part of a Quality Engineering Strategy. Ideally testing should be automated, but if there is headcount for a tester what effect will this have? Likewise we should view on-call as part of a Reliability Engineering Strategy. As engineers we should be writing our software to not fall over the second a cosmic ray hits the server (or AWS Lambda floating in the ether). By mandating On-call it sends a message to the team: whatever work you could do is not enough to mitigate going on-call.

This Software Will Self Destruct in 3-2-1

All this talk on Engineering Practices got me thinking about what are the largest factors that influence the engineering practices we adopt, and of those which can we never know ahead of time.

In an ideal world the Software Engineering we do, and the practices we carry out to make those systems, should be the absolute minimum required. There’s no reason to invest more time or effort in either the engineering or the way we do it than is required. Where this minimum is some combination of product fit, reliability, maintainability, team happiness, etc. To get to this, what if every time we started a project we talked about when it will be shut down? To live dangerously, what if we put a self destruction function in the code?

We already do it for some things. For scripts that will run once we don’t write unit tests or a changelog. For code bases we want to iterate on we invest in writing the CI/CD stack. For systems that need to be reliable we invest in logging and monitoring.

Would knowing the deprecation date of the service, before it’s written, effect the software engineering practices used? How about the deprecation dates of APIs and Platforms that your service depends on? If the organisation’s CI/CD platform is going through a technology shift then a team shouldn’t invest much time into the soon-to-be-deprecated platform.

The full interview with Kent Beck is available: audio, transcript


Related Posts