GOV.UK exists to make government services and information as easy as possible to find and use.
For that reason, we're not huge fans of PDFs on GOV.UK.
Compared with HTML content, information published in a PDF is harder to find, use and maintain. More importantly, unless created with sufficient care PDFs can often be bad for accessibility and rarely comply with open standards.
Many departments are doing great work to move away from them. For example, the Driver and Vehicle Standards Agency (DVSA) blogged about how it created and published its strategy in HTML and Public Health England has written about its work to move away from PDFs.
Content managed by the GOV.UK team in GDS is entirely in HTML and the training, guidance and tools we provide for publishers encourage HTML by default. However, we still have around 200,000 PDFs on GOV.UK and we’re publishing tens of thousands of new ones each month. We’ve heard from GOV.UK publishers and we know there are pressures that can make it difficult to avoid using PDFs.
The default should be to create all content in HTML. If you can’t avoid publishing a PDF, ideally it should be in addition to an HTML version and the PDF must meet accessibility standards and archiving standards. We hope this post will help publishers explain the problems with PDFs to their colleagues and support moving towards an HTML-first culture.
Problems with PDFs
They do not change size to fit the browser
On a responsive website like GOV.UK, content and page elements shift around to suit the size of the user’s device and browser. However, PDFs are not designed to be flexible in their layout. They generally require a lot of zooming in and out, and scrolling both vertically and horizontally. This is especially troublesome with long documents and on small devices like mobile phones.
They’re not designed for reading on screens
People read differently on the web, so it’s really important to create content that is clear, concise, structured appropriately and focused on meeting the user need. A PDF document that was created for offline use will not suit the context of the web and is likely to result in a poor user experience.
It’s harder to track their use
We cannot get as much information from analytics about how people are using PDFs. We can get data on how many times a PDF has been downloaded from GOV.UK, but we cannot measure views of the file offline.
In addition, we cannot get data about how users have interacted with a PDF – for example how long they’ve viewed it for or what links they’ve followed. This makes it harder to identify issues or find ways to make improvements.
They cause difficulties for navigation and orientation
Depending on the user’s device and browser, PDFs might open in a new browser window, new tab or a separate app. Sometimes they automatically download to the user's device. Whatever happens, the user is taken away from the website when they open a PDF. This means they lose the context of the website and its navigation, making it harder for them to go back if they need to.
This is even more of an issue if the user goes directly to the PDF from a search engine. Without the context of the site the PDF is hosted on, they can’t easily browse to related content or search the website.
It’s also worth remembering that although many devices and browsers have PDF viewers built-in - and they are freely available to download - there are still users who do not have them, or cannot download them.
They can be hard for some users to access
The accessibility of a PDF depends on how it was created. For example, it needs to have a logical structure based on tags and headings, meaningful document properties, readable body text, good colour contrast and text alternatives for images. It takes time to do this properly.
Even if this work is done according to best practice, there’s still no guarantee that PDF content will meet the accessibility needs of users and their technology. Operating systems, browsers and devices all work slightly differently and so do the wide variety of assistive technologies such as screen readers, magnifiers and literacy software.
Some users need to change browser settings such as colours and text sizeto make web content easier to read. It’s difficult to do this for content in PDFs. You can magnify the file, but the words might not wrap and the font might pixelate, making for a poor user experience. Locking content into a PDF limits the ability for people to make these kind of accessibility customisations.
It’s our responsibility to ensure that our users can access the information we publish. Plus, publishing content in HTML will also reduce the need to supply alternative formats on demand to users who can’t access a PDF.
They’re less likely to be kept up to date
Compared with HTML, it’s harder to update a PDF once it’s been created and published. PDFs are also less likely to be actively maintained, which can lead to broken links and users getting the wrong information. This can be especially problematic if a document has been published in multiple formats. Any changes need to be made to all the versions, meaning more work and more opportunities for error.
In addition, users are more likely to download a PDF and continue to refer to it and share it offline. They may not expect the content in the PDF to change and might not check the website to get the latest information. HTML documents encourage people to refer to the website for the latest version.
They’re hard to reuse
It can be very difficult to reuse content from a PDF by copy and pasting it. The design and layout of the PDF can produce unexpected results, particularly if it has multiple columns, hasn’t been structured correctly, or uses incompatible fonts.
We’re also working on tools to extend the use of our web content - such as a new content API and ways to measure the quality of content. These tools will not work with PDFs. Publishing content in HTML means it will work with new developments like these - and for whatever platforms we might use in the future.
Similarly, users cannot use browser extensions and add-ons such as Google translate on PDF content.
Why do people use PDFs?
Despite all this, there are understandable reasons why PDFs remain popular in government. Below are some of the common reasons for creating PDFs and the counter-arguments GOV.UK publishers may find helpful as they help their colleagues make the shift to HTML.
They’re quick and easy to create
PDFs may seem to be the fastest option because they can be easily created from popular applications that people are already using to author and share documents.
Converting content into HTML takes a bit of time. However, as explained earlier, creating a fully usable and accessible PDF from a source document requires specialist knowledge and can actually take longer than creating the content in HTML.
Control over the design
Authors and publishers have more control over the layout, design and branding of a PDF. This can be especially important when there is a need to include complex tables and charts, which are sometimes tricky to create in HTML. However, the downside is that there will be people who do not or cannot access the content. Plus, the content will not benefit from the simple and consistent design of GOV.UK that’s been tested and optimised for users and is trusted as a credible source of information.
They’re easy for people to download and print
While this is certainly true, you can print HTML web pages just as readily. And modern operating systems and browsers also make it easy to download or save web content. And as mentioned earlier on, it’s not ideal for users to download documents as they can quickly become out of date.
They have the feel of a stand-alone product
We know from GOV.UK publishers that they’re often sent content for publishing that is already in PDF format. This might happen because authors want control over the final content and design - and PDFs are easy for them to create.
It can also be because the document was primarily created for offline use - after all, government is still very paper based. There’s a common feeling that a PDF publication is a more tangible and credible ‘product’ compared to a HTML publication.
These are understandable reasons, but they’re an outcome of an ingrained print culture and outdated content production processes. Government is transitioning towards a digital first culture, but old habits and ways of working take time to change.
What we’re doing to help
We'll continue to improve GOV.UK content formats so it's easy to create great-looking, usable and accessible HTML documents.
We also intend to build functionality for users to automatically generate accessible PDFs from HTML documents. This would mean that publishers will only need to create and maintain one document, but users will still be able to download a PDF if they need to. (This work is downstream of some higher priorities, but is on the long-term roadmap).
We cover the main problems with PDFs in the training that all GOV.UK publishers have to do. Discussion about these issues continues on the government content community’s Basecamp and at community events.
We want to hear from you. If you’re a GOV.UK publisher and have any suggestions for improvements that would help you to publish in HTML rather than PDF, please let us know.
---
Update 23 July 2018: Thanks for all the comments and tweets in response to this post - it’s great to see that it has sparked so much support and debate.
To clarify, we are not suggesting there is no place for PDFs on GOV.UK. There are some cases where a PDF might be required to meet the needs of the user. For example, when there’s a need for a static document to show what was said at a particular point in time. In these cases, publishers should continue to publish a PDF in addition to HTML. We strongly discourage from publishing only a PDF in these cases - for all the reasons stated in the post.
When we have built the ability to auto-generate accessible PDFs from HTML pages and show detailed version history, the need for PDF in even these cases will diminish.
In the meantime, it’s worth noting the National Archives’ Government Web Archive scrapes the content of GOV.UK pages at regular intervals to capture what was published and how it looked at particular points in time. And of course you can download, save to PDF and print HTML pages natively in most operating systems and browsers.
Subscribe to future blog updates.
58 comments
Comment by Adrian Hallchurch posted on
Here at HMRC we post quite a few minutes on the Group pages on GOV.UK, and these are all in pdf. Looking towards the EU directive, it would be great to post as html, but there is no option for this on the production site - the only option is to upload a new file. Is this something you can change?
Comment by Tony Daly posted on
A lot has been said on here about the problems of downloading HTML as PDF. There really is no need for this if the authoring process is changed. What should be done is that all documents should be created in Markdown, complete with version control. This should be the master version of the document, held in the HMG archives. The Markdown file(s) should then be processed using software such as Pandoc to create HTML and PDF versions. The HTML file should be used for the browser and the PDF offered for download for offline viewing.
It just needs a change of approach to authoring documents.
Comment by Tom Parvin posted on
Hi,
I regularly consume information from the .GOV pages within my role as a compliance office. I review complex documentation relating to regulatory requirements on topics including ISA manage guidelines, legally required data submissions.
The transition to html has made my role significantly more difficult. As a compliance officer my role is to ensure we comply with the legal duties we hold. This means that when a guideline or return document is changed, I need ensure we update our internal process or data output so that it is inline with the change. It is very easy to identify change in traditional document formats (.pdf, .docx) with the employment of comparison tools, however it is almost impossible to carry out this comparison with information hosted within html. This could be addressed through an improved revision history, which gave detail on the change as oppose to only the date it was last updated.
The only way I can address this is by taking copies of the data held on the web pages and storing them in my own library - this is obviously a time consuming process and is limited in application when dealing with larger, more widely distributed guidelines (take AEOI guidelines, for example).
This inefficient means of hosting information is risking the legal compliance of the organisation I work for, and likely many others. The .GOV team should be implementing change that makes information easier to consume by the people who use it, as I assume this is the core purpose of the site, rather than focusing on what benefits can be achieved internally.
Comment by Marty McFly posted on
You might want to unearth the code GOV.UK threw away when it erased businesslink.gov.uk, which introduced a Print to PDF tool in ... 2005. It allowed users to select articles and combine them in a basket, before creating a single accessible PDF from all the articles. The PDF had a cover page, table of contents and page numbers, and attractive typography. It could be downloaded there and then or emailed to the user. It was an instant hit with businesslink.gov.uk users. Thirteen years later, you appear to be planning something comparable to appear in the dim mists of the future ...
Comment by Neil Williams posted on
Thanks Marty. I’m glad to hear the ‘print to PDF’ tool on business link was something you found helpful. It’s useful for us to know the demand for a similar feature to inform our product prioritisation. If we had a DeLorean we could go back and grab that code, but even then it wouldn’t be compatible with the programming languages and structures we use now 😉
Jokes aside, your point is heard loud and clear. We will try to push this up the prioritisation list. In our defence it's not that we are delivering slowly, rather that we're delivering other, more pressing improvements. GOV.UK is incomparably larger and more complex than Businesslink was, having replaced that site and nearly 2,000 others - and we're delivering on behalf of the 410 organisations that share the platform. But yes, this is a much needed feature and it's helpful to hear that confirmed in the comments on this post.
Comment by Emily da Costa posted on
Great to get a sense of GDS's position and to see the healthy debate below the line! Some ideas that would really help us in practical terms:
* link checking in HTML attachments (this is pretty basic stuff)
* the ability to use internal linking within HTML attachments (at the moment if we link to specific sections we have to use whitehall links for checking and then change these to live links before we publish)
* the ability to print whole manuals with one click or offer a downloadable PDF alternative: this is absolutely crucial for us as some of our primary audiences often need this information offline
* a Word (or open equivalent) to markdown converter (don't know if this is feasible or not)
* better version control: I think this was something GDS has claimed to be looking at but an indication of timescales on this would be very reassuring.
We love the idea of users being able to generate a PDF version from - and this would go a really long way to help us sell an HTML-first content strategy internally. I really hope we do get this functionality soon.
Comment by Dan posted on
Hi, interesting and timely blog! I lead on Agent Toolkits (https://www.gov.uk/government/collections/tax-agents-toolkits) which are accessible PDF guidance documents that address common errors seen in filed returns.
We have been looking at a way to potentially move away from PDFs through our E-learning platform. We have a live BETA test version of one of the toolkits currently live: http://www.hmrc.gov.uk/courses/syob4/tlkt_inctaxlosses_2017to2018_guide/
Our users do like the PDF format but i understand at some point we will need to move away from this. Currently this transformation project has been placed on hold due to other departmental priorities. It would be interesting to hear your thoughts on the BETA version (which technically does still contain a minor element of PDFs with the checklist. We have found it difficult to assess what could be a suitable printable alternative checklist).
Comment by M Ocram posted on
What user research have you performed to support your assertions? Can you please publish it?
Comment by Neil Williams posted on
Thanks for your question.
Most of the points raised in the blog post are technical truths. Some of those factors depend on how the PDF is created, of course. The PDFs created by government range from being images with unselectable text, all the way to PDFs that are able to work with assistive technology.
From a survey we ran on assistive technology at the end of 2016 (https://accessibility.blog.gov.uk/2016/11/01/results-of-the-2016-gov-uk-assistive-technology-survey), we know that screen magnifiers and screen readers are the most commonly used types of tech. These happen to be the technologies that PDFs pose problems for the most. A recent update to Acrobat Reader has certainly helped, adding the ability to customise background and text colours for example, but it’s a setting that’s buried away, and means we’re asking users to learn how to use something else (and download more software).
Comment by Thomas Smith posted on
Most annual reports by companies and organisations are in PDF and HTML is unpopular for this information, however, I like the idea of an automatic conversion tool, is this could be integrated into the requirements for annual reports to be in xHTML from 2020 I think it would be very helpful.
Comment by Mark Richards posted on
Since HTML 5, HTML is an unversioned living document, the version of it today is not necessarily the version of it tomorrow.
Which might not be a problem if it promised backwards compatibility, but it does not, and HTML features from previous versions have already been deprecated and browsers advise can be obseleted at any pont. HTML is not designed for archiving. https://developer.mozilla.org/en-US/docs/Web/HTML/Element#Obsolete_and_deprecated_elements https://html.spec.whatwg.org/multipage/obsolete.html
Anyway, let's take the first example of the DVSA.
https://www.gov.uk/government/publications/dvsa-strategy-2017-to-2022/helping-you-stay-safe-on-britains-roads-dvsas-strategy-for-2017-to-2022
Notice the YouTube video.
Public records should be expected to last for decades at the very least, if not hundreds of years.
Why is the publication dependent on a service, famous for breaching the public's privacy, just got fined as part of a competition lawsuit against them (and other Android bundled applications), that can control which members of the public can access the content, surveills users, controls what video formats it will be available in and may charge users in the future. Nevermind that if there is any problem with the company in question, its existence is in the control of another sovereignty who could shut it down, restrict UK access or let it go bankrupt.
Public records should be in the control of the public, not any third party organisation that does not answer completely to the UK governemnets' wishes and control if necessary.
It appears our public records are going to be have an increasingly large number of holes in them from third parties they depend on and HTML features that may be removed.
PDFs may not be great, but it seems they're more likely to stand the test of time than HTML.
Comment by Neil Williams posted on
Hi Mark,
Thanks very much for your comment.
HTML is indeed a living document, the way the internet operates necessitates the ability to keep pace with how it’s used.
As I mentioned in the update on 23 July, The National Archives do a great job of scraping a copy of the pages on GOV.UK. While doing this, they make sure they capture how the pages looked (including features that may become depreciated in future).
They also back up government Twitter feeds and YouTube channels. That said, they don’t seem to have that one from DVSA so I’ll get in touch with them to see if it’s in their queue.
Comment by Keith Emmerson posted on
Hi Mark,
Small update on this. I've been in touch with The National Archives, they will be adding the DVSA YouTube channel to the WebArchive (http://webarchive.nationalarchives.gov.uk/video/) shortly.
Other departments' channels will be also be added in due course.
Take care,
Keith
Comment by Rebecca Cave posted on
I asked the community on AccountingWEB.co.uk whehter they wanted Govt information as PDF - note the sub-heading was written before the clarifiation to this blog was posted on 23 July.
You can read the 33 responses here: https://www.accountingweb.co.uk/any-answers/do-you-want-govt-information-in-pdf-format
The vast majority wanted to keep PDFs to access while offline and save as part of cleint's files.
Comment by Gemma posted on
Hello, you say "We want to hear from you. If you’re a GOV.UK publisher and have any suggestions for improvements that would help you to publish in HTML rather than PDF, please let us know." But how do I do that, in these comments?
I'm a government statistician, and we produced our statistics in HTML for the first time recently. They look a lot better than in a PDF, and they're more accessible. But some people still want a PDF, and using print to PDF looks awful and the charts disappear. ONS.gov.uk has the functionality to download html stats bulletins as a PDF with the click of a button. When might we expect this on gov.uk? Thanks!
Comment by James Smith posted on
As a very frequent use of gov.uk for work purposes as a Chartered Accountant, the general quality of documents has been declining for some time in terms of ease of use, largely due to pages being dynamic and much harder to read than "old style" PDF documents which have a much better and more readable format. They also seem to have been written by far more competent members of staff, and the web content which is often rushed and incomplete.
Remember the text is for reading. That is its prime function.
It is also crucial in my role as a tax advisor to take the CURRENT advice or interpretation of tax legislation, and tag that to a client file when giving advice, as when a tax investigation occurs, which may be as long as 5-8 years after having given that advice the current document will inevitably have been changed in that time. So having to output HTML text to a "jumbled" export is not good form. This is all saved electronically, but PDF's are much easier.
Comment by Roger posted on
Excellent post Neil and I agree with you about the 'ingrained print culture and outdated content production processes.' I wonder if the publication content type in GOV.UK may contribute to confusion as it presents HTML and PDF attachments as if they're equivalent - even having pages labelled as 'HTML attachments' within an HTML website may be creating confusion for people who don't know what HTML is.
Comment by Neil Williams posted on
Thanks for your comment, Roger.
It’s a good point, and interpretation of this will vary according to the technical knowledge of the user. For someone who knows how the web operates, this is likely to be very frustrating – like you say, it’s all HTML.
HTML attachments were labelled as such within our publishing application to show equivalence of function and importance; HTML pages are just as good as PDF files.
As our publishing system has matured, we could label these more accurately, but we need to make sure we balance the need to show that they satisfy the publisher’s desire to demonstrate it as a discrete document.
Comment by Nathan Dolan posted on
Neil,
I agree with most of what you have said here with respect to using HTML for content delivery, and many services (especailly gov.uk) should have no need for PDF. But it is *very important* to recognise that HTML is a very poor choice for documents of any legal significance.
- HTML cannot be digitally signed in any sensible way. PDF is the de-facto standard for electronically signed documentation. See also PAdES under eIDAS legislation.
- HTML is not in any way a long-term archival format. PDF/A is. HTML looks different on different devices and browsers today, let alone in 25 years.
- Downloading (saving) HTML. I have to disagree with you strongly here Neil. Saving as HTML is a terrible idea. Anything saved is potentially broken the moment your web session expires, because it typically doesn't embed content; it links to it. Even for basic static content, the sands will shift over time (links changing and such like). HTML that renders now may well not render tomorrow. This is why very few users would ever save as HTML (in my experience).
- The alternative to the above (printing to PDF), is unreliable and generally results in non-accessible, non-archival, sometimes corrupted PDFs. For most users, this sucks less than saving as HTML, so they do this.
- "It’s not ideal for users to download documents as they can quickly become out of date." Sorry Neil but you are monumentally missing the point of documents. Is your June bank statement out of date because it's now July. The point of documents is that they *can not* change. Especially legally significant ones. D'oh!
- There is nothing non-accessible about PDFs. You just need to tag the structure. This is entirely analogous to HTML. You can create HTML that is an accessibility nightmare just as easily as doing likewise with PDF.
- PDF is the de-facto ISO open standard (ISO 32000-2:2017) for electronic documents for a reason.
HTML is great for but sucks for documents. PDF sucks for content but is great for important documents.
For important documents, services should create proper archival, accessible (tagged) PDF documents, ideally signed with a verifiable (by a TA) corporate signature using PAdES.
Comment by Neil Williams posted on
Hi Nathan,
Thanks for your well-reasoned comment! I’ll reply to your 7 points in turn, if that’s ok.
1: Most of the PDFs uploaded to GOV.UK aren’t documents that require the user’s electronic signature, and our HTML documents aren’t designed for this use either. Where there’s a need to obtain some sort of legal declaration from the user, we advise departments and agencies build online services. It’s then up to them to decide on the most appropriate method that is approved under eIDAS, be it a check box or something more sophisticated.
2: Very true – web pages don’t naturally lend themselves to archiving. Fortunately for us, The National Archives’ Government Web Archive go to great lengths to not only capture the content of our pages when they scrape them, but also how they looked.
We ask publishers to make sure they comply with the PDF/A standard whenever they upload a PDF, but this isn’t always possible.
3: Saving as HTML could certainly be better supported, and isn’t the easiest thing in the world. That said, I’ve had good experiences saving in Chrome 67.0.3396.99 (saving as ‘web page, complete’.
Links can and do change, we try to avoid this by using redirects wherever possible, but the same is true of links from PDFs too.
4: Printing to PDF is a compromise, I agree, and uses our print-specific stylesheet. As it stands, that method preserves the words and hierarchy of a page, rather than the decorative elements. As I mentioned in the post and my earlier comment, we’re hoping to add a feature that automatically generates a PDF of the page within our publishing software.
5. There’s certainly a user need to be able to retain old copies of files, your example of dated bank statements for one, and things like historic tax rates so accountants can check past payments. However, much of our content should be updated as policies, environmental factors, and practical considerations necessitates, something PDFs aren’t the best format for.
6. We can’t categorically say that PDFs aren’t accessible, if made properly (with the heading tagging that you mentioned etc.) they are fine in most cases. We’ve found that in some instances (even when everything is done correctly), users who need to magnify or change the colours struggle to do so. There are also more variables in both the software used by the producers, and the readers. As I said in the post, one of the reasons PDFs are made is because they’re quick. Taking the time to make them properly accessible and test does just that. Our publishing software is more of a controlled environment, so we have a greater chance of making sure our HTML pages are accessible to all.
7. The ISO standard is definitely a step in the right direction, and I’m sure it will help foster consistency when sharing electronic documents. It’s a shame it doesn’t extend to converting documents to PDF; hopefully that will be covered in the next revision.
Comment by Josh Levett posted on
Is there any particular reason HTML publications on GOV.UK don't get given a cover image?
It makes the PDF editions of documents look like the preferred version as they get the official-looking cover preview.
Moving forward with an HTML-first focus, would enabling cover images for HTML publications not make sense?
In addition - could the GOV.UK backend handle the conversion of HTML publications to PDF/A format automatically, with a new PDF created for each change made to the HTML 'master' version to aid the convenience of offline reading?
Comment by Pete Hewitt posted on
Really interesting comments on some of the potential pitfalls and how to fix them but this is definitely the way to go.
I'm curious to know how the back end works to create the HTML Document style. Is it essentially the same as a normal web page editing screen or can it suck content in from Word or such like and format from there?
As Higher Ed is another guilty party when it comes to endless PDFs this is something we'll need to tackle in the near future but working out the exact workflow will be a challenge.
Comment by Jason Rogers posted on
Problems with GOV.UK ... and the “hate pdf and paper” strategy ...
Capturing content for use offline is more or less impossible. Adding a simple “print to PDF” option that renders documents in reasonable 10 or 12 point text rather then the 18 or 20 point that GOV.UK creates would be a first step. Remember that sometimes people are not online - and this is way more often then you might think.
Many, many people still prefer the printed page. They may even print documents to read long form to avoid the problems of extended screen use. GOV.UK does nothing to address this - trying to print the html pages results in comically large text with no attempt to format content in page form. It also wastes reams of paper.
The ethos for the site seems to focus on accessibility first - whilst forgetting just how unusable this can make content. The site needs to accessible AND useful to EVERYONE - not magnified and dumbed down. I won’t comment here about the clumsy “journey” to log into the HMRC site these days ...
Text size way is too big - this hampers readability rather than improving accessibility. Yes, I could ask the browser to make the text smaller but when I hit ‘back’ this makes the previous pages unreadable. I can’t even shrink the size on my phone or tablet. It’s time to add a simple function at the head of the page to shrink the text to normal sizes or to change the default. Better yet, have a choice of style sheets to render the content at normal size rather than like an infants reading book. I guess we might call this “truly reactive”.
To close, PDF remains the most useful way to package a document for offline use with near 100% reliable rendering. HTML, and worse, coupled with poor design choices like here, is nowhere near this level .... yet. The PDF haters need to think about why people find them so useful.
Comment by Neil Williams posted on
Hello, thanks for your comments on this.
I don’t want you to think that we hate paper and PDF, and I’m sorry the post came across that way.
There are definitely times when paper is best, and there is a place for PDFs where there is a proper reason for them (we only listed the common ones). Our push to move towards web-native formats won’t and shouldn’t replace those scenarios. Also, these rules only apply to government’s web publishing activities.
We have a responsibility to make GOV.UK available for everyone, and we work towards making it accessible for the greatest number of people. We have to accept that we’re going to struggle to make it perfect for 100% of our users, but knowingly excluding users because of their access needs isn’t acceptable.
Large text is easier to read, but does require more scrolling. On balance, it helps more people than it hinders. Rather than considering this as ‘dumbing down’, we think of this as opening up government to people who weren’t able to access it before. This was either because it was too complicated, or that they were prevented from doing so because due care wasn’t given to people with disabilities.
Comment by Harry Lund posted on
In my experience the main reason people default to PDF is that document owners/creators all have the tools to create them (i.e. Google docs or Word) - whereas to create an HTML page you have to either find a gov.uk publisher (few and far between in some departments) or become one yourself. It's quite cumbersome to work through an intermediary on a document undergoing frequent revisions, so people tend to wait until the final version is ready before looking into creating an HTML version - at which point you can be timed out, so just go with the easy PDF version.
I'd love to see lots more people trained up to be publishers. And indeed I've been trying to sign up for the training myself, but have been told no courses are currently planned.
Comment by Neil Williams posted on
Thank you for all of these comments. A rare thing these days to have this much below-the-line action on a blog post 🙂
I agree with many of the pro-PDF points being made here. They have their uses, and where that's the case we would recommend publishing both in HTML and PDF (and we will at some point add a feature that happen automatically from our publishing software). The problems come when PDF is the only format on offer - that's the behaviour I would love to confine to history.
Agree also with points on version control. I should have made it clear in my post that GOV.UK has features to track versions of HTML content, through the 'page history' at the foot of each page (which includes notes about what has changed). We store the entire content of all past versions of HTML pages in our database and intend in future to make that whole rich history available through our API. The same is not true of PDFs, which can be overwritten using the same file name, without retaining history of changes (other than by comparison with offline copies).
Glad this post has sparked so much support (and a healthy degree of debate) here and on Twitter.
Comment by David Tallan posted on
We, too, have been moving away from PDFs to HTML for just the same sort of accessibility reasons you describe in the article. We are starting to hear more and more frequently from librarians and archivists who are concerned about the long term impacts of this in terms of preserving the government record. They share the concerns that Nathan raised above that "HTML is not in any way a long-term archival format. PDF/A is. HTML looks different on different devices and browsers today, let alone in 25 years." Aside from the concerns about looking different, the self-contained nature of PDFs makes them much easier to archive.
What is GOV.UK doing to support long term preservation of the record? Is the assumption that everything will always be accessible to researchers and scholars of the future through the current platform (or its descendants)? Or are materials regularly copied for long term preservation elsewhere? If the latter, do you find HTML a challenge for that?
Comment by R K Hayden posted on
Whilst I agree that PDF should never be the only format in which content is made available, I think the scenarios in which the GDS team view PDFs are viewed as an appropriate option are too narrow. As noted by others, PDFs are good for reading offline. (And by offline, I still mean on-screen, not on paper). For documents that if published as html would be spread over many html pages, a single PDF file also provides an easily searchable document. You should be providing users with the option to access documents in a range of open formats.
I also do not agree with your statement that "They’re not designed for reading on screens". They're only not designed for reading on screens because the designers of the documents have chosen to make them so. Whilst it is true that, as you say, "A PDF document that was created for offline use will not suit the context of the web and is likely to result in a poor user experience." that doesn't mean that you can't produce a PDF that is designed to be easily read on screen, whilst still being easy to read in a print format. Many PDFs, even some labelled as "web-optimised" - such as the one at https://www.gov.uk/government/publications/industrial-strategy-building-a-britain-fit-for-the-future - are laid out in two-column format, whereas a single column format is more suitable for viewing on-screen.
Comment by Ian Taylor posted on
Our company did the same thing. However, it has taken away one crucial feature that PDFs support: offline reading. If I want to read a document on an iPad/phone where there's no wifi - how do I do that? Until you address this problem you have to accept that you've removed one route to accessing your material. Printing multiple web pages to PDFs hardly solves this, as you're assuming the user could collect all the material they need before they go offline.
So yes, PDFs have their disadvantages but like many content (and technology) providers, your decision is based on convenience to yourself, and not the user.
Comment by Adrian Barker posted on
From a user perspective, pdfs are often a lot easier and more convenient, particularly for longer 'published documents'. Slow internet connections can slow down reading html. A well designed pdf can have good navigation. For offline viewing, saving web pages with multiple small files can be a pain. Shouldn't there be good, clear version-control information easily accessible on both html and pdf? Clearly html is appropriate for most web content but sometimes a pdf is better from the user point of view. For reading on other devices, why not consider other formats like epub (though there are problems with non-text).
Comment by John Norman posted on
My perspective as an end-user is that I have no particular interest in PDF per se, but I do want to be able to capture the state of a web page at a particular time (especially if I am acting on the advice of that web page). So I would urge you to continue to improve the printability of pages, which has the side effect of allowing many users to create PDFs from the print dialogue. I checked this page and things are definitely improving. Print margins may be an issue still.
Comment by Andy posted on
Bad idea. Version control is essential for government documents and official forms, which you can't do easily in HTML. PDFs print far better too
Comment by James posted on
Maybe people don't remember the time when all the world required Microsoft Word Documents. PDF was the thing that broke that monopoly.
PDFs arn't perfect but they provide a way of publishing something and people being able to actually have a copy of that information.
Also HTML/sites have a MAJOR problem with Link Rot.
Comment by Cyril Randles posted on
Many of your arguments for html seem to be based on your own needs for data and analysis.
As a user I would like to be able download a time stamped version which is readable as a single document. I would be happy to have a rubric built in to such a version which gives a creation date and version and a warning that updates may have taken place since the download.
For legislation and regulations the ability to extract a version which is valid at a particular date would be valuable.
Comment by Paul Driver posted on
Schools are terribly guilty of this dirty bad habit.
Why the new EU rules coming in next year are providing them with a fairly broad exemption is beyond comprehension.
In full disclosure, I am a school website provider.
Comment by Abdurrehman posted on
Open Government Data standards are on average still below 3 star in UK due to much of the data not being machine readable or understandable, to do so one has to put massive manual labour to extract information out of encrypted files like pdfs.
HTML can be converted to machine readable format easily, it can be converted to RDF a or graph db.
Because data is collected with a space of microseconds it's almost outrageous to rely on an offline copy of it.
Comment by Abdurrehman posted on
I think there's still a lot to cover our final destination is not pdf or html its to be machine readable in the context of web 2.0 or linked data.
Comment by Steve Messer posted on
Here, here! Make the Web understandable to humans _and_ machines.
Comment by Paul Bradley posted on
Bravo!
Similar issues exist in the higher education world where PDFs quietly rust away in their tens of thousands across hundreds or thousands of sites and micro-sites.
Two reasons PDFs persist in higher education are (a) a belief that they deliver a better user experience for "long documents" - that is multi-thousand word rules, regulations and reports and (b) professionally formatted prospectuses, reports and the like, offer a user experience that HTML can't.
Time for a bit of digital education in higher ed?
Comment by Howard Pang posted on
I await the day we're all using Markdown.
Comment by George Davies posted on
HTML DOES require software to read unless you are proficient in turning tags into meaningful information.
HTML requires an INTERNET Browser. Yes most OSs come with one built in.
Comment by George Davies posted on
Just as long as you leave the ability to consolidate all the many and various chapters of the huge documents that you are publishing this way into a single version for someone who still likes to read a BOOK.
Comment by Kenneth Levinski posted on
Version control is the biggest hurdle I see. I find that users have great satisfaction in knowing that "this copy of the document hasn't changed since I last saw it".
For example, the sample publication https://www.gov.uk/government/publications/dvsa-strategy-2017-to-2022/helping-you-stay-safe-on-britains-roads-dvsas-strategy-for-2017-to-2022 given as a good standard by https://www.gov.uk/guidance/content-design/content-types#html-publications has no way to be easily downloaded and no way to know if it has changed since it was published, and how it was changed since it was published. It does have a metadata page with that information, but you have to search for it from the publications home page. The metadata page has a change log, but no diff and no way to download the older version.
Comment by Adrian Hallchurch posted on
Most of the content I receive is either in pdf or Word - what would be really useful would be a simple reference guide for officials to use to ensure that they produce content that is easily converted to html (for example, explains heading hieararchies, and limiting factors such as not including text tables).
Comment by Clive Lever posted on
All of this should also apply to content on local government websites. It is worth sharing it with the LGA.
Comment by Kenneth Tombs posted on
As someone with a very long standing interest in this HTML is perfect for delivering content online. It is diabolical to work with offline and there are many circumstances where a single 'record' entity is still required. PDF may not be an 'open standard', it is a universal reusable/transportable single entity documents. HTML is a programmatic language, you can't assume everyone can/will speak HTML, and who physically prints anything these days? Unless its printing to a PDF!
Comment by Thomas Edwards posted on
PDFs require software to be read. HTML does not. HTML will outlive PDFs.
Comment by John posted on
Your browser of choice is the software that reads HTML. HTML by itself is not viewable as intended if you open the html file in software that doesn't support it... same for PDFs.
Comment by James posted on
err, all documents on a computer require software.
html generally requires one of firefox, chrome, safari, edge or internet explorer.
pdfs can be read using firefox, chrome, safari, edge or internet explorer.
Comment by Andy posted on
HTML requires software to be read. A web browser.
Comment by Kenneth Johnson posted on
How is the HTML read if not with software?
Comment by paul posted on
HTML requires software to read too, just that every modern puter comes with the software pre-installed.
Comment by Matthew A posted on
HTML requires software to be read. There are significant differences in the rendering of HTML across the software written over the last 25 years, and it's ability to be read well on both desktop and mobile devices.
Comment by Christopher posted on
Your web browser is software. So is a text editor, to anticipate your response.
Comment by Leonard Rosenthol posted on
PDF has been an open international standard (ISO 32000-1) for a decade now (since 2008), longer than HTML5 has been. And subsets of PDF, such as the one used for long term archiving (PDF/A, ISO 19005) have been around even longer.
HTML requires software too - what do you think a web browser is? And every single browser on every single major OS platform (desktop and mobile) includes native support for PDF rendering, as well as HTML.
Comment by Chris Moore posted on
Fantastic! All very welcome, especially as we move into the era of the EU directive on accessibility.
Comment by Dave Thackeray posted on
Amen. Sounds like you've not quite cracked the nut, but that you're at least endeavouring to be where we all need to be. In a PDF-free world!
Andy Crestodina says PDFs are 'the rust of the internet'. I'm wholly inclined to agree.
Looking forward to seeing a page showing your race to zero. Every time someone strikes off a PDF with an accessible HTML page in its place, you'll creep ever closer...
Comment by Tim Blackwell posted on
Your rust is my evidence. Ideally we would have a truly self-contained document format without the drawbacks of .PDF. Until then, HTML sites which can be revised/withdrawn without notice are not an acceptable substitute.