Guest post: looking at the different ways to test content

Today we hear from two content experts sharing their thoughts about testing content. Christine works with GDS as a trainer, and is a content strategist. Emileigh is a lead content designer and strategist working at 18F in the USA.

They met for the first time on a sweltering summer day in Washington, DC. A quick cup of (iced) coffee turned into a months-long, transatlantic conversation on the different ways writers can test online content to see whether it needs improving.

Much like Supreme Court Justice Potter Stewart’s famous saying, we know good content when we see it.

Good web content is clear. It’s actionable. Readers find what they’re looking for — and they’re looking for a lot these days. People rely on websites to conduct research, fill out tax forms, read the news. We know good content when we see it, and we’re frustrated when we don’t.

Keeping this in mind, are there ways that writers can quantify and measure their writing? We’ve looked at different tests you can run depending on the age of your audience. Finding appropriate ways to test our content helps us improve and find best practice patterns for creating copy.

Hand-drawn diagram on graph paper. Graph shows a set of axis reading Comprehension (north); Success (east); Confusion (south); Failure (west). Star is located equidistant between comprehension and success. — Christine’s graph of good content. The star marks the sweet spot.

Who you are writing for

Readers will come to your content with varying levels of existing knowledge and different educational backgrounds. This has a huge impact on how you create content. Tell knowledgeable users what they know and you’ll bore them. Assume they know more than they do and you’ll frustrate or even lose them.

Here’s a good rule: if you’d like to reach a broad range of users, you should strive to write at a middle-school level. A 2003 Department of Education assessment showed ‘average’ Americans read at a seventh/eighth grade level, and in the UK it’s been reported that the average reading age is 9 years old (15 would be the maximum).

Whenever possible, keep things simple, short, and clear.

Testing your content depends on your audience

When it comes to seeing if the content you’ve written is working, the way you test it will depend on who you’ve written it for. We need to pay particular attention to how we frame the exercise, because of the stress associated with taking a ‘test’. It’s the content that’s being tested, of course, not the person, but this distinction needs to be made clear.

Open-ended questions

Asking open-ended questions such as “What does this mean to you?” is helpful when you’re creating content for people with different cognitive needs. Maybe they have learning difficulties or aren’t familiar with the language. On GOV.UK we might look to ask open-ended questions to those using the PIP checker or who want to apply for a visa to see how much they understand.

If you’re writing content for children, using open-ended questions is a good way to see if they understand the information.

In 2015, Emileigh led the content strategy for Every Kid in a Park, the website for a Barack Obama initiative to let all US fourth graders and their families visit national parks for free.

Using open-ended and task-orientated content questions allowed me to keep pressure low and still measure how well kids understood the programme and their overall enthusiasm for the site. For example, “How would you use this website to sign up for Every Kid in a Park?”

Let people choose their own words

When you let people tell you how they feel in their own words, you can use the same language in the copy, thereby letting your audience have a direct influence on the content. This technique can be used on sensitive content such as dealing with health, relationship breakdown or loss. For example, GOV.UK has content on what to do after someone dies, not after someone ‘passes away’.

Christine recently worked on an app that helps young people in care prepare for meetings. They can also use it to help them talk about their feelings, for example to a social worker.

In the app, young people can choose the feelings that they’re experiencing, plus add their own if they want. We wanted to test whether the feelings we’d chosen were representative and appropriate.

In testing, target users were asked to list all the feelings they’d experienced in the previous two weeks. We then matched these to the ones we had in the app, plus the feelings people had written into the ‘add your own’ field.

By letting users identify their own words we could see if the assumptions we’d made about people’s state of mind were accurate or not.

A/B testing

This kind of testing compares two versions of content to see which performs better. It’s also a good way to test how users connect with your content. Maybe your site is easy to read and understand, but users aren’t interacting with it in the way you hoped.

The organ donation sign-up case study shows how A/B testing works. Because this message appears after booking a driving test, we know users will be over 17 years old.

The National Health Service (NHS) wrote eight variations of content asking users to sign up as organ donors. For example:

Please join the NHS Organ Donor Register.
Please join the NHS Organ Donor Register. Three people die every day because there are not enough organ donors.
Please join the NHS Organ Donor Register. You could save or transform up to 9 lives as an organ donor.

The sign-up rate for each piece of content was measured and the most successful was:

Each content variant the NHS tested was plain language and are easily understood. The A/B test showed which call to action was most effective (though not why).

Cloze testing

For content about sensible subjects such as finance, regulation and health, the Cloze test is ideal to help measure your readers’ understanding.

In the Cloze test, participants look at a selection of text with certain words removed. Then they fill in the blanks. When creating a test, you can delete words using a formula (every fifth word), or you can delete selectively (key words). You can accept only exact answers, or you can accept synonyms. Sample as many readers as possible for greater accuracy.

When developing Cloze tests for betaFEC — the US Federal Election Commission’s new web presence — Emileigh selected passages of at least 150 words, deleted every fifth word, and accepted synonyms as answers. The target was 50% or greater accuracy; in practice, FEC’s Cloze test scores ranged from 65% to 98%.

Preference testing

Christopher Trudeau, professor at Thomas M. Cooley Law School in Michigan, did research into legal communication to find out ‘to what degree do clients and potential clients prefer plain language over traditional legal language’.

He found that the more complex the issue, the greater the reader’s preference if for plain English and that the more educated the person, the more specialist their knowledge, the greater their preference is for plain English.

He employed a range of ways to test content for his research, including A/B testing and asking respondents:

“Would you prefer this or this version”

then following up with:

“Why / why not?”

He then asked longer, qualitative question series eg:

“Have you ever read a document that was difficult to understand?”

“Did you persevere?”

“Why did you stop reading it?”

He named the method of using a mixture of A/B then qualitative questions ‘preference testing’.

For checking whether people understand risks (in informed consent cases) he says the best way to check comprehension is for the person who has read the document to be asked follow-up questions, eg: “Based on what you read in the document, can you explain the main risks to me ...”

This means a real person - not a computer - uses their judgment to assess whether they understood the content.

Measuring what you’ve written

Over time, organisations have developed reading scores and indexes for measuring the ‘readability’ of content. For example, the Coleman-Liau index, the SMOG index, and the Gunning fog index.

One that we find consistently suits our needs is the Flesch-Kincaid grade level. Developed for the US Navy, Flesch-Kincaid measures sentence and word length. The more words in a sentence (and the more syllables in those words), the higher the grade level.

Using these formulas will help you quickly estimate how difficult your text is. It’s a clear metric that can help you advocate for plain language. But, like every formula, Flesch-Kincaid misses the magic and unpredictable nature of human interaction.

They also can’t help you figure out that “Patience you must have my young padawan,” is harder to read than “You must have patience, my young padawan.”

Close up of Yoda (Star Wars) — The only difference between standard English and 'Yodish' is the word order; but this change can make it harder to understand.

We’d love to hear about your ways of testing content and comprehension.

Follow Christine on Twitter, Emileigh on Twitter, and don’t forget to sign up for email alerts.

3 comments

Comment by Bob Mathers posted on 07 April 2016

Interesting & thought-provoking. Initial reaction was, if they don’t understand my copy, it’s their problem, but it opened my eyes to the idea of running text through a mincer. Putting my superior language skills to the side for a moment, I tried the Cloze Test & scored 1 point (for ‘photo’). Actually, I got bored & didn’t finish it, then panicked that 6 years of High School & two degrees were wasted until I read it was down to lack of knowledge of context, in this case Facebook – phew!
My own experience in the training field (with young professionals/ academics) confirms that using open-ended questions combined with letting people choose their own words is an essential & useful tactic - but you've got to work at it.
Having conversations / discussions prior to writing up can be a big time investment. It’s important to get tangible results but talking is ephemeral. So I always try to capture key words on paper or flip chart in order to work them into text for later use. If you don’t do this - like the Sale Deals in shops, ‘when it’s gone, it’s gone’.

Link to this comment
Comment by David Smallwood posted on 06 April 2016

Not sure what: "When you ask people to self-identifying language you allow them to have a direct influence on the copy" means . . .

Link to this comment
- Replies to David Smallwood>
  
  Comment by Christine Cawthorne posted on 06 April 2016
  
  Hi David, sorry for the confusion, I meant when you let people tell you how they feel in their own words, you can use the same language in the copy, thereby letting your audience have a direct influence on the copy.
  
  Thanks for pointing that out - I'll tweak the post.
  
  Link to this comment

Guest post: looking at the different ways to test content

Who you are writing for

Testing your content depends on your audience

Open-ended questions

Let people choose their own words

A/B testing

Cloze testing

Preference testing

Measuring what you’ve written

Further reading

Share this page

3 comments

Government Digital Service

Sign up and manage updates

Be part of the transformation

Follow us

Leading Government Digital and Data

Recent Posts

Comments and moderation

Social media house rules

Who you are writing for

Testing your content depends on your audience

Open-ended questions

Let people choose their own words

A/B testing

Cloze testing

Preference testing

Measuring what you’ve written

Further reading

Sharing and comments

Share this page

3 comments

Related content and links

Government Digital Service

Sign up and manage updates

Be part of the transformation

Follow us

Leading Government Digital and Data

Recent Posts

Comments and moderation

Social media house rules