GOV.UK came out of beta just over two weeks ago. In that time we've released over 100 updates. I want to talk a little about how we do that, but mainly to focus on why this approach makes for more stability and less risk for the service.
Before we begin I just want to say thank you to one of our Cabinet Office colleagues whose tweet last week spurred me to write this blog post.
https://twitter.com/Rchards/status/261152775608610818
The Big Bang Release
In some organisations, people fear releasing new applications or new versions of software. Lots of websites, especially large applications within large traditional organisations, don't change very often. Many will have fixed release schedules which might mean one release every six months or so. This means bundling up lots of changes into a single release, which is bad in at least two ways:
- The people using your service don't get new features and improvements quickly. It could be weeks or months before an improvement that only took a few days to finish is actually released for people to use
- By bundling up lots of new features you make the release more complicated. This complexity leads to lots of different ways the release can go wrong
The combination of complexity, risk and the infrequent nature of releases makes for a stressful event for all involved. No wonder most people don't like release day!
Practice Makes Perfect
Before we released the first beta version of GOV.UK in February, the person then in change of the project, Tom Loosemore, asked if the release process would work. I checked, and reported back that we had run it successfully more than a thousand times by that point, so we were quite confident the process would work just fine.
Releasing software comes with risks, so trying to minimise those risks is prudent. We do that in a number of ways:
- By releasing smaller chunks regularly it's much easier to see what is going to change, and if something goes wrong it’s much simpler to roll that change back and undo it
- Doing something regularly makes the case for investing in automation easier. This helps remove much of the potential for human error and makes releases the same every time
- If you're doing something several times a day you tend to get better at it. Practice does make perfect (or as close as we can get)
A Deployment Pipeline
Taking any software project and moving to a very rapid release schedule is hard, and requires up-front work on systems and processes to minimise the associated risks. For GOV.UK this meant:
- Building the system to be as redundant as possible. Intermediary caches, load balancing and architectures based on small composable services help to make the system more able to deal with many kinds of failure without affecting visitors
- Making extensive use of continuous integration, an automated process which tests our code on every change. Code that doesn't pass our test suite can't be released to production
- Having an extensive suite of monitoring checks for GOV.UK. We collect over 10,000 individual metrics every minute and have approximately 1,300 alerts sets up which tell us about problems before they happen, and in some cases trigger automated failover without the need for human intervention
Keep Improving
There’s still lots about our release process we want to improve. We'd like more consistency between how we release different applications, to create automated reports for auditing purposes, and to add even more monitoring checks. When you're using the release tools every day to ship new features it makes sense to keep working to improve them.
9 comments
Comment by This week at GDS | Government Digital Service posted on
[...] quiet work of releases, updates and improvements. It’s not as dramatic but it’s more effective, safer and serves our users [...]
Comment by A little blog on the side – Inside Inside Government | Government Digital Service posted on
[...] release updates continuously, so there’s a lot of changes to keep these stakeholders informed about. We could be meeting this [...]
Comment by Sysadmin Sunday 104 - Server Density Blog posted on
[...] UK Government: Regular Releases Reduce Risk [...]
Comment by Paul Swartout posted on
This is a great insight into how trust can be built within an organisation by implementing something that works every time you use it. Only by repeating the same process many many times will people trust you when you say "I'm confident it works". Adding extensive monitoring adds to this level of trust - as well as giving a clear insight into how the platform is behaving and reacting to changes.
When making any change to an on-line 24/7 solution is risky, having a process in place which reduces the risk by making changes frequently seems counter-intuitive but it works.
I'm pretty sure there a vast amount going on under the covers (automated testing, CI, cache management, orchestration of load balancers, etc) and it would have taken quite a bit of time, effort and money to set up but as a UK tax payer I for one think this is great. CD and DevOps in action!
The fact that one of the most cumbersome and sloth like institutions in the world has embraced this way of working is very encouraging. Maybe this can provide some proof to other large organisations that CD and DevOps can and do work.
Comment by Etienne Pollard posted on
Worth taking a look at https://github.com/alphagov/ and reading through http://digital.cabinetoffice.gov.uk/2012/10/12/coding-in-the-open/ for more on our development approach.
Etienne Pollard
GOV.UK Programme Director
Comment by David Mytton posted on
This is way to deploy an application where you have full control over the environment in which it is being deployed. It requires not only an easy way to push out code and changes but a way to monitor the impact then perform a rollback if necessary. Websites, applications and SaaS products are the perfect candidate for this because the developers maintain control.
It's less practical if you're deploying local software e.g. updates to a client web browser or self hosted email server. These updates generally happen on defined schedules and after testing. Rollback is more difficult and you want to do it in a controlled manner. But that doesn't mean you can't still do tests and CI to ensure that the releases are as high quality as possible.
Comment by This week at GDS | Government Digital Service posted on
[...] fixing bugs and making tweaks to copy and code wherever necessary on GOV.UK. As Gareth noted in his blog post earlier today, we’ve release more than a hundred updates to GOV.UK since [...]
Comment by Mike MacAuley posted on
Couldn't agree more Gareth. At the LGA we run a networking platform for the local government sector called Knowledge Hub. Our experience shows that weaning the process addicts off the release cycle fix (or fix release cycle) is much harder than one might expect. This is especially true if there are contracts, KPI's and penalty clauses in place.
We have a similar (if not so frequent) approach on the Knowledge Hub project at the LGA which is also run using an agile methodology. There is a monthly release process and things are bundled but the regularity focuses minds and gives some assurance to the operational team working with the platform that everything is moving forward all the time and actual progress is being made without waiting 4-6mths for anything to change.
I'd be very keen to know what your post-release testing processes are like? These are key in our experience.
Comment by Etienne Pollard posted on
Hi Mike
Good to hear that you're using agile to good effect at the LGA. Do you have weekly or fortnightly sprints internally, even if your releases are monthly?
Gareth and the others can give chapter and verse on our release processes, but in response to your question about post release testing I'd say that the key thing is that we focus on *pre*-release testing. We rely on the monitoring and alerting mentioned in the piece to pick up any problems, rather than doing explicit post-release testing after each release. That's because we only release something to the production environment once it's been successfully deployed to our preview environment (which means it needs to have a full set of automated tests, known as the "smoke tests" - see http://en.wikipedia.org/wiki/Smoke_testing#Software_development). This works because we maintain a "preview environment" that is identical in every respect to the production environment, and we're very, very strict about not releasing anything to production that hasn't been shown to work on preview.
Etienne Pollard
GOV.UK Programme Director