We want to make it easier for users to find what they need on GOV.UK. To achieve this, we are:
- grouping all GOV.UK content into subject-based categories (creating a taxonomy)
- improving the content that sits within the taxonomy – because content that’s poorly titled, duplicative or in the wrong format is hard to find, no matter how good the taxonomy
We started by dividing the site into broad subject areas, such as education, transport and ‘coming to the UK’ (visas and immigration). These ‘themes’ will not necessarily be top-level categories in the new navigation; they’re simply a means of breaking the site into more manageable chunks for us to tackle.
We’ve already transformed education content, and in April we started on the transport theme.
All aboard
We at GDS obviously can’t do this alone. We need to work closely with the departments and agencies that own GOV.UK content.
For the transport theme, that means partnering with the Department for Transport (DfT) and its agencies.
DfT was keen to get involved, and confident that its agencies would be able to contribute people and time to the project. We soon agreed our approach with Sioned James, Head of Content and Digital at DfT, and Gavin Dispain, Digital Editorial and Publishing Manager. A memorandum of understanding was signed and work began.
Our route
We’ve broken the transport content transformation project into stages:
- Create an inventory of all the content in the theme.
- Audit the content (excluding some formats – news, for example – that we wouldn’t want to change retrospectively).
- Improve the content (where necessary).
- Tag the content to the new taxonomy and republish.
We also created the taxonomy to classify the content, which involves 4 steps:
- Generate a list of terms to describe the content.
- Group those terms into a rough taxonomy.
- Do user research to understand how users of transport information think about it (what tasks they need to complete, what words they use to describe their work, and so on), then feed that learning into the creation of the taxonomy.
- Iterate and validate that taxonomy by testing it with users.
The journey begins
We began by running a ‘discovery’. Gavin came to work with us at GDS so that we could:
- validate the inventory of transport content
- agree the content types in scope for audit (mainly guidance)
- review the questions we were asking in audits
- audit some of the content to see how quickly it could be done
We audited a ‘sub-theme’ of transport – rail content. We discovered we could audit more quickly than expected and finished in under a week, giving us time to also complete an audit of aviation content.
Next, we ran a ‘user identification workshop’, which doubled as a kick-off meeting for DfT’s agencies. We had both content designers and subject matter experts at the meeting.
After presenting an overview of the project, we ran an exercise to generate a list of user groups from across the transport domain. We needed this for planning user research.
We were now ready for DfT and its agencies to start auditing their content. We ran a series of one-day training sessions to introduce content designers to the process.
Once auditing began, we in GDS reviewed progress and ‘spot-checked’ audits to ensure consistency of approach across the agencies. We ran internal ‘content clinics’ to discuss tricky content issues that we would need to advise agencies on.
Get me a taxonomy
In parallel with the auditing, we started work on creating a taxonomy for transport content.
We didn’t want to review all the many thousands of transport-related content items to come up with terms to describe it. That would have taken far too much time (and would have driven us mad).
Instead, some of our developers figured out how to create a 950-item list of transport content that contained a high variety of subject matter. In other words, we minimised the duplication of similar content items. This allowed us to generate a reasonably comprehensive list of terms to describe all transport content without reading all of it.
The result was 650 unique terms, like ‘driving licences’, ‘bus regulation’ and ‘maritime training’.
Next, we grouped the terms. We printed each one on a card, laid them on a big table and arranged them into groups of similar terms. The results were transferred into a spreadsheet to create our – very rough, very flat – draft taxonomy, which we’ll start testing with real users over the coming 3 months.
We’ve also been conducting interviews with some of the user groups identified in the session at DfT: driving instructors, MOT testers and a pilot, among others. We want to understand their information needs, the language they use and the tasks they need to complete.
How far we’ve travelled
Our goal for April to June was to audit 2 transport sub-themes. Thanks to the tremendous efforts of content designers in DfT and the agencies, we came close to completing all 10 – and did complete them before the end of July, with 7,396 items audited.
Our other goal was to have a draft taxonomy for testing with users, and to start research to understand their needs. Destination reached: the taxonomy has been refined after testing with users and stakeholders, and DfT and its agencies will start tagging content very soon.
The onward journey
Now the audits are done, the hard work has really begun: making the improvements to content identified during the audit phase. DfT and its agencies are phasing this work so the things that matter most for making the content findable – titles, summaries and correct content types – are fixed first.
Our next trip
DfT and its agencies have done great work on this transformation theme. And, working with UK Visas and Immigration, we’ve also made great progress on the ‘coming to the UK’ theme.
We’ve learned a great deal about doing large-scale content improvement and creating taxonomies. Between October and December we’re going to use this learning to turn content transformation into a product: a standardised set of tools, guidance, training and other resources that we can hand over to government departments and agencies to allow them to manage the process themselves. Watch this space!
You can follow John on Twitter, and don't forget to subscribe to our blog.
6 comments
Comment by Ian Turton posted on
I enjoy getting these posts. Wonder if I' missing some/many though as the vast majority of those I get are all about Agile and "front end" processing.
I'm retired now - but worked in PS many years. My experience is that the real issues lie with massive "old" backend systems - simply skinning them anew is to my mind running away from the difficult technical (and management) issues.
So be great to hear much more about that (e.g. how are legacy applications being handled? rewritten? How are on site (or off-site) old mainframes being replaced....)
Comment by GDS posted on
Hi Ian, we're not dealing with a legacy backend system, but a system we build in-house. The following blogs might contain some posts which might be more relevant to your interests:
https://gdstechnology.blog.gov.uk
https://insidegovuk.blog.gov.uk
You can use the search box on these blogs to search for specific topics you're interested in.
Comment by Umair posted on
I am curious.. Do we still have to 'Tag' ? Hasn't AI, Machine Learning and Natural Language search capability over taken this
Comment by GDS posted on
Indeed! Although we've learned that machine learning is only as good as the data you feed it. So, we're optimising our manual tagging activities to get to good training data sets, which our data scientists are using to polish their auto-tagging algorithms. We're seeing some good results and we'll no doubt blog about it soon.
Comment by Martin Glancy posted on
'...our developers figured out how to create a 950-item list of transport content'
It would be useful to know how they did this. For example, was this by keyword analysis of body text, or of titles, or did you use search terms?
Comment by GDS posted on
Our team of developers ran a Latent Dirichlet Allocation (LDA) based algorithm (topic modelling) on the content inventory to group similar content items and then pick one content item from each grouping to give us a list of dissimilar content that covers the domain. You can read more here: https://github.com/alphagov/govuk-inverse-similarity.
You can also read more about our previous work using data science to build taxonomies here: https://dataingovernment.blog.gov.uk/2017/01/12/using-data-science-to-build-a-taxonomy-for-gov-uk/.