Now that GOV.UK has replaced Directgov and BusinessLink, and departments are moving to Inside Government, we want to make sure that people visiting links to the old sites get where they need to be. We want them to be redirected to the correct page on GOV.UK, with no link left behind.
This post is about the tools we built to make that possible.
Finding the links
The first thing we needed was a list of everything we wanted to redirect: all the Directgov and BusinessLink URLs (links and Web addresses). This proved to be a fairly significant task - both sites had long histories and various different providers, so a comprehensive list of these URLs did not exist.
Instead, we collected our own lists from a variety of sources, including traffic logs, records of friendly URLs (shorter, more memorable links that redirect to longer URLs), and the results of spidering the sites.
This gave us a total of about 8,000 Directgov URLs and about 40,000 BusinessLink URLs.
Wrangling the URLs
Many of the lists of URLs existed in various spreadsheets, maintained by different people. We needed a canonical source of truth. So we built the Migratorator.
The Migratorator is a Rails app, backed by a MongoDB database. It allows multiple users to create one-to-one mappings for each URL, where the mapping consists of the source URL, status (whether it will be redirected or whether, no longer representing a user need, it is now gone) and, if applicable, the page to which it will be redirected.
As well as the mapping information, the Migratorator allows us to capture other useful information such as who has edited a mapping, tags showing information about the type of mapping, and a status bar showing how far through the task we are.
Checking the mappings
We needed to confirm that the mappings were actually correct. We wanted several people to check each mapping, so we created the Review-O-Matic.
The Review-O-Matic is also a Rails app and uses the Migratorator API to display the source URL and the mapped URL in a side-by-side browser, with voting buttons.
We asked everyone in GDS to help us by checking mappings when they had some spare time. However, clicking through mappings can be dull, so we ran a competition with a prominently displayed leader board. The winner, who checked over 1,000 mappings, won cake.
Confirmation from departments
The Review-O-Matic presents the mappings in a random order, and the way it's set up means that links within pages cannot be clicked. This is good for getting as many mappings as possible confirmed, but our colleagues in departments needed to check content relevant to them in a more methodical and interactive way. Enter the Side-by-side Browser.
The Side-by-side Browser displays the old and the new websites next to each other. Clicking a link on the left hand side displays what this will redirect to on the right hand side.
The Side-by-side browser is a Node.js proxy that serves itself and the site being reviewed on the same domain, so that it's 'live' and not blocked by the Same-Origin policy. We joked that, in essence, the side-by-side browser was a phishing attack for the good!
Initially it used the Migratorator API for the mappings. However, once we'd built and deployed the Redirector, we could use that instead to populate the right hand side. As well as simplifying the code, this meant we could now see what the Redirector would actually return.
At this point, we distributed it to our colleagues in departments to check the mappings and raise any concerns before the sites were switched over.
We used another trick to test Directgov mappings while the site was still live. We created a domain called aka.direct.gov.uk, which was handled by the Redirector, and a bookmarklet. By replacing the 'www' with 'aka' the bookmarklet allowed us to see what an individual Directgov page would be replaced with.
The Redirector itself
For the actual redirection, we use the open-source Web server Nginx. The Redirector project is just the process for generating the Nginx configuration. It's written mainly in Perl with some PHP.
Generating the Nginx config requires logic to determine from the old URL what kind of configuration should be used.
For example, the important part of a Directgov URL is the path, e.g. www.direct.gov.uk/en/Diol1/DoItOnline/Doitonlinemotoring/DG_196463, while for BusinessLink the essential information is contained in the query string, e.g http://www.businesslink.gov.uk/bdotg/action/detail?itemId=1096992890&type=RESOURCES. Redirecting these two types of URL requires different types of Nginx config.
This logic, plus the mappings we gathered, make up much of the Redirector project.
The joy of tests
In addition, the project contains a suite of unit and integration tests, including one that runs every night at 5am. This test checks that every single URL in our source data returns a status code that is either a 410 'Gone' or a 301 redirect to a 200 'OK'.
For a few weeks before the launch we also ran the daily Directgov and BusinessLink logs against the Redirector to see if there were any valid URLs or behaviour we'd missed. By doing this we found that, for example, even though URLs are case-sensitive, Directgov URLs were not, and users would therefore expect www.direct.gov.uk/sorn to work in the same way as www.direct.gov.uk/SORN.
The final task was to point the DNS for the sites we're now hosting at the Redirector. Now users following previously bookmarked links or links from old printed publications will still end up on the right place on GOV.UK.
The configuration now has over 83,000 URLs that we've saved from link rot, but if you find an old BusinessLink or Directgov link that's broken then let us know.
Traffic through the Redirector is easing off as GOV.UK pages are consistently higher in the Google search results, but it's been really exciting making sure that we do our best not to break the strands of the Web.
Comment by SC posted on
Interesting 🙂 and very impressive.
One comment though: the 'side by side' browser link is dead 🙁 Ironic isn't it?
Comment by Paul Downey posted on
Ah, that's my fault. Sorry! I'm in the process of productising https://github.com/alphagov/side-by-side-browser as a standalone node app and moved the original project to our github attic: https://github.com/gds-attic/review-o-matic-explore
It certainly is ironic, and a lesson on the value of owning your own domains!
Comment by Anna Shipman posted on
The redirect has now been fixed. Thanks a lot for bringing this to our attention!
Comment by Kola posted on
A really good creative use of different tools/technologies to solve the problem. Any particular reason you used Node.Js for the side by side comparison?
Comment by Paul Downey posted on
Comment by Andy Paddock posted on
I just had a small battle with some old pages that came inside the Defence Gateway, their previous google ranking was damaging our ranking until we deployed the rel=canonical tag.
Comment by Agencies and Arm’s Length Bodies – the next phase of Inside Government | Government Digital Service posted on
[...] to move large and complex websites with millions of users onto GOV.UK. We know how to ensure that existing inbound links are redirected to the right place on GOV.UK. We also know that moving about 300 agencies and ALBs onto GOV.UK will [...]
Comment by Andy Marchant posted on
very impressive... thanks for releasing these as free tools too!
Comment by dazbert posted on
Where does this leave freindly URLs? were they mapped at the same time using the tools?
I guess departments have been feeding back into the process as well?
Comment by Anna Shipman posted on
Yes, friendly URLs are also mapped using the tools, and we're continuing to work with departments on any outstanding ones.