One of the things I’m focusing on at the moment is starting with the right building blocks; evaluating the quality of the national health data we produce and asking “is it good enough quality?”

 I’m imagining now that many of you are about to stop reading. I can just hear people thinking: ‘Goodness!  A column on data quality; could he have picked a drier subject? And anyway, data; surely it’s just right or wrong?’

But please, the column will be interesting, I promise. Or at the very least, it will deal with a subject that I think is really important. So here goes…

Taking data on a journey (literally) 

I remember, back in 2001, jumping into my car carrying a password-encrypted disk full of data from the hospital I was working at and driving down to Warwick Technology Park to get it into ‘Clearnet’. This used to be the national collection of healthcare data.

The reason I had to physically drive the disk to the park was to make sure that the information was submitted in time for our contractual deadlines; there were no ‘secure file transfer protocols’ then for data sets.

I remember entering the numbers into a template on the floppy disk that represented the trust’s activity, and thinking: “There must be a better way of doing this!”

The experience got me thinking about how to create and then send a single piece of data on a journey to arrive into a health data system. I thought about how the use of that single piece of data can be, potentially, vast; and how important it is we get it right as a result.

Fifteen years on, and with more hospitals having some sort of electronic patient record in place than not, there has never been a more important time to reconsider that journey, and to make sure that data entered and created in health systems is correct.

One bit of data, many uses

I have only been admitted to hospital once (thankfully, touch wood) but it also made me think about my record (as you do, when you’re an individual with geek data roots) and how many uses it may have. I noticed how many times the administrator, doctor and nurse entered my record, and how much use they made of it.   

But what a journey any data item makes. It’s inputted, and then… extracted, processed and loaded into another data holding system, extracted again and changed into a different format.

It’s copied and submitted to the national system and then extracted again by a commissioning / funding organisation for review and payment. It’s held by the original hospital, centrally and nationally, and by commissioners/funders.

It may also be provided to a number of other organisations to carry out research, performance, and benchmarking. And all that comes on top of its clinical use for me as part of my medical history.

We have some work underway to review and streamline this process, while increasing access to data, but disseminating less.

There are also some excellent national and local initiatives underway that will mean that, just as we can access our bank details, car registration, passport information, and holiday details, we will be able to access our health record.

When these become mainstream, they can only help the data quality agenda; as will other access options such as apps, as Ade (Byrne from Southampton) nicely summed up for us a few weeks ago.

Focusing effort on getting collection right

Meantime, I am passionate about improving the quality of the data the health service generates, and making its journey and use more efficient. As in driving forward the innovative data science work it will be vital that the health data building blocks are correct to start with.

At the Health and Social Care Information Centre (which will become NHS Digital this summer), we are about to kick off some discovery work to look at collecting some aspects of data direct from providers.

The hope is that this will reduce the burden on them, improve timeliness and, most importantly, underline that collecting data as close to the point of care as possible gives us the best chance of receiving good quality data possible.

I have been fortunate enough to be involved in a range of peer review journal publications, using large health data assets (I prefer this term to ‘big data’ as an asset has a much better definition than just ‘big’).

The rigour in which calculations are reviewed, picked apart, improved, commented on, re-done, for these publications is impressive. You have to get above a very high bar, which is a credit to the research community.

Once I’d been involved in a couple of these publications, I started to consider about the amount of effort put into reviewing each other’s stats, calculations and quality of research; and whether we put anything like the same amount of effort into getting the data correct that’s utilised for this, in the first place?

Hmmm. Maybe this is a leading question. But I worry to what extent we could be building a house on sand. We’re focusing all our attention on the quality of the bricks and the roof; not such a good idea if the foundations sink and it all falls down.

Apologies if this all sounds a bit bleak and (don’t get me wrong) data quality has significantly improved; but there’s work to be done.

New guidance – please use it!

As I said at the outset, the question it all boils down to for me is this: “Do we all place as much importance as we should do on the quality of health data creation? Do we pay as much attention to it as we do the technology that underpins it, and to its primary and secondary use?”

If we can raise the use of the data to a certain extent, then hopefully we will get individuals engaged in correcting its entry at source. The less we move data around, the better it will be, as every time data is transferred there is the potential for error to be introduced.

Put those two thoughts together, and the future has got to be more direct feeds of data from the bedside into systems and databases. Fortunately, there are lots of examples of this around the country, with more and more starting to emerge.

Incredibly, two of the oldest but still most useful documents to support this quest come from Ivanov in 1972 and the World Health Organisation in 2003.

We also published a Data Quality Maturity Index and supporting documents yesterday; with a forum that we want to open up to anybody interested in this work. So it’s time to set off on another journey. Let’s make every record and piece of data count; and put the focus required into ensuring that the building blocks on which we make so many decisions are correct.   

Daniel Ray


Daniel Ray has worked in health informatics for 17 years. Until recently, he was director of informatics at a large teaching hospital, where he transformed the health informatics service, set up a quality outcomes research unit, and developed a patient portal among other work programmes.

He recently joined the Health and Social Care Information Centre (soon to become NHS Digital) as director of data science. He is also honorary professor of health informatics at UCL’s Farr Institute of Health Informatics, where he gets involved in leading edge research and teaching.