How to turn NHS data into ‘gold’
- 13 August 2025
The NHS holds vast stores of data it cannot use or sell. The “sidestep” of synthesised data is the answer, write NHS trust colleagues Martin Farrier and David Chapman
Remember Care.data? Or perhaps like much of the world you have chosen to forget. It was the botched attempt to pull GP data into a central database. The plan was for it then to be anonymised and used by planners, researchers and other institutions. Some of those other institutions would be outside the NHS.
The Care.Data programme was brought to its knees by issues related to confidentiality and the ability to opt out.
But that was a decade ago. Since then, the value of data has risen. It’s often said to be the new oil. The world’s biggest companies are data based. The NHS holds one of the biggest databases in the world. So surely the NHS should be one of the biggest and richest companies in the world. Obviously, that’s not true and so there must be a defect in the argument.
Here is the defect: we have data but we can’t use data.
Some of that’s about investment, and the data being in the wrong format, as well as the rather confidential nature of the data that we hold. Of course, that hasn’t got in the way of the world’s biggest companies. They gather very personal data and share it all the time. We still want to be trusted and valued, so it’s crucial that we play by the rules and carry trust with us.
Secure data environments
The NHS is still trying to find big solutions to the problem of sharing data. There is a £200 million plan to develop secure data environments. These are intended to be safe ways of sharing data. They are a step forward, but they still suffer the problems of coordination and incompatible data and they need consent.
If data in secure data environments is going to have a wide clinical use, and perhaps be commercially available, we will have to revisit the debate that brought down Care.data.
There are lots of suggestions about how we can anonymise data so that it cannot be identified. Those suggestions tend to break down because of the nature of our data. It’s never going to be completely non identifiable. Rare diseases happen to small numbers of people; they become identifiable.
Globally agreed format
But even if we could anonymise data and then share it the format problem bites back. We tend to have the data in formats that aren’t comparable. My data and your data aren’t easy to compare. We need data that is stored in the same format.
Behind the scenes, there are moves forward. The first is to develop an agreed format to express our data. That exists. OMOP – Observational Medical Outcomes Partnership – is a globally agreed format that standardises data from medical sources and allows the data to be analysed.
Anyone who is interested in big data and academic analysis would like our data in an OMOP format. It makes my data and your data easier to investigate because they are the same. They could also be combined.
Synthetic data could be safely shared or even sold. It’s not patients’ data, but a reliable, accurate replica
The next move forward is more of a sidestep. We need to be able to sidestep the problem of anonymised data. The sidestep is synthesised data. If we use our data to create a synthetic copy, it is no longer identifiable because none of the patients in the dataset are real. They are synthesised. However, they are synthesised in such a way they remain accurate to academic investigation.
Synthetic data could be safely shared or even sold. It’s not patients’ data, but a reliable, accurate replica.
At Wrightington, Wigan and Leigh NHS Foundation Trust we have started this journey. We now have many of our datasets available in OMOP format. Working with Hartree Science and Technology Facilities Council we have built a model that can test the accuracy, and more importantly ensure the privacy, of synthetic data. We can now start the process of creating synthetic data in an OMOP model and testing it to demonstrate its accuracy.
These are steps forward a decade on from discovering how difficult it was to share health data safely. They are important steps, and they aren’t especially difficult. They can allow each organisation to bring their data to a place where it can be safely shared and utilised. That might then get us to the point where healthcare data is valuable. Perhaps not oil. More like gold.
Martin Farrier is director of digital medicine and chief clinical information officer at Wrightington, Wigan and Leigh NHS Foundation Trust (WWL). David Chapman is chief data analytics officer at WWL.