De-identified UK Biobank health data accidentally published online
- 17 March 2026
- UK Biobank confirmed that volunteers' de-identified health data has sometimes been unintentionally published online by researchers
- An investigation found that one dataset contained millions of hospital diagnoses and associated dates for more than 400,000 participants
- The biobank's CEO said that any data put online by researchers did not contain personally identifying information
UK Biobank has confirmed that volunteers’ de-identified health data has sometimes been unintentionally published online by researchers.
The biomedical database, established in 2003, contains in-depth genetic, lifestyle, and health data from 500,000 UK participants, which is used to advance medical research.
An investigation, published by The Guardian on 14 March, found that de-identified health data has been exposed online on dozens of occasions, with one dataset published containing millions of hospital diagnoses and associated dates for more than 400,000 participants.
A UK Biobank volunteer gave specific, personal health information to a reporter, which was cross-referenced with UK Biobank data to establish that the person was a participant.
Through this re-identification scenario, the reporter was able to pinpoint hospital diagnosis records for the volunteer, using their month and year of birth and details of a major surgery they had undergone.
In response, Professor Sir Rory Collins, chief executive and principal investigator at UK Biobank, published a statement asserting that volunteer’s personal information in the biobank is safe.
He admitted that “in a small proportion of cases, some scientists have unintentionally put these de-identified data along with their research findings on websites which are publicly accessible”.
But he said that these data did not contain personally identifying information such as your names, addresses, dates of birth or NHS number.
“After 14 years of making your data available for scientific discovery, we have no evidence of any of you being unwillingly identified.
“Ensuring your personal information is used safely and correctly is our number one priority.
“There has not been any hack or data breach of UK Biobank, and if there had been you would have heard about it from us,” Prof Collins said.
More than 20,000 scientists around the world have been approved to use volunteers’ de-identified data to discover how to prevent and treat disease.
“Even though the data are de-identified before being made available to researchers, we don’t want them to be used by researchers who have not gone through our rigorous access review process.
“Consequently, we’ve taken steps to help researchers avoid putting any de-identified data on code repositories, and to ensure that they are removed rapidly if it occurs,” Prof Collins said.
UK Biobank first detected de-identified participant data on an online code repository in 2022, which Prof Collins said was removed immediately.
The biobank has previously refuted claims that researchers from a ‘race science’ network accessed volunteers’ health information in 2023 and that health data was shared with insurance companies multiple times between 2020 and 2023.
Prof Collins said that to prevent future incidents, UK Biobank has introduced mandatory training on data security, built a tool for researchers to check their code, and built automated search tools for participant data on code repositories.
Meanwhile in February, the government granted approval for UK Biobank researchers to access coded GP patient data for research purposes, which is expected to double the number of recorded cases of health conditions commonly handled by GPs.

1 Comments
If they dont understand the risks around re-identification they shouldn’t hold positions of responsibility over personal data.
That reads as a total non-answer.
Where we go from here I really dont know.
Comments are closed.