GOV.UK came out of beta just over two weeks ago. In that time we've released over 100 updates. I want to talk a little about how we do that, but mainly to focus on why this approach makes for more stability and less risk for the service.
Before we begin I just want to say thank you to one of our Cabinet Office colleagues whose tweet last week spurred me to write this blog post.
The Big Bang Release
In some organisations, people fear releasing new applications or new versions of software. Lots of websites, especially large applications within large traditional organisations, don't change very often. Many will have fixed release schedules which might mean one release every six months or so. This means bundling up lots of changes into a single release, which is bad in at least two ways:
- The people using your service don't get new features and improvements quickly. It could be weeks or months before an improvement that only took a few days to finish is actually released for people to use
- By bundling up lots of new features you make the release more complicated. This complexity leads to lots of different ways the release can go wrong
The combination of complexity, risk and the infrequent nature of releases makes for a stressful event for all involved. No wonder most people don't like release day!
Practice Makes Perfect
Before we released the first beta version of GOV.UK in February, the person then in change of the project, Tom Loosemore, asked if the release process would work. I checked, and reported back that we had run it successfully more than a thousand times by that point, so we were quite confident the process would work just fine.
Releasing software comes with risks, so trying to minimise those risks is prudent. We do that in a number of ways:
- By releasing smaller chunks regularly it's much easier to see what is going to change, and if something goes wrong it’s much simpler to roll that change back and undo it
- Doing something regularly makes the case for investing in automation easier. This helps remove much of the potential for human error and makes releases the same every time
- If you're doing something several times a day you tend to get better at it. Practice does make perfect (or as close as we can get)
A Deployment Pipeline
Taking any software project and moving to a very rapid release schedule is hard, and requires up-front work on systems and processes to minimise the associated risks. For GOV.UK this meant:
- Building the system to be as redundant as possible. Intermediary caches, load balancing and architectures based on small composable services help to make the system more able to deal with many kinds of failure without affecting visitors
- Making extensive use of continuous integration, an automated process which tests our code on every change. Code that doesn't pass our test suite can't be released to production
- Having an extensive suite of monitoring checks for GOV.UK. We collect over 10,000 individual metrics every minute and have approximately 1,300 alerts sets up which tell us about problems before they happen, and in some cases trigger automated failover without the need for human intervention
There’s still lots about our release process we want to improve. We'd like more consistency between how we release different applications, to create automated reports for auditing purposes, and to add even more monitoring checks. When you're using the release tools every day to ship new features it makes sense to keep working to improve them.