UK Biobank has today unveiled incredible new data from whole genome sequencing of its half a million participants, a step that is set to drive the discovery of new diagnostics, treatments and cures.  

The data, uniquely, is available to approved researchers worldwide, via a protected database containing only de-identified data (e.g. name, address, date of birth, name of GP and more stripped out) 

This abundance of genomic data is unparalleled, but what cements it as a defining moment for the future of healthcare is its use in combination with the existing wealth of data UK Biobank has collected over the past 15 years on lifestyle, whole body imaging scans, health information, and proteins found in the blood. 

After five years, more than 350,000 hours of genome sequencing, and over £200 million of investment, UK Biobank is releasing the world’s largest-by-far single set of sequencing data, completing the most ambitious project of its kind ever undertaken.

“This is a veritable treasure trove for approved scientists undertaking health research, and I expect it to have transformative results for diagnoses, treatments and cures around the globe,” said Professor Sir Rory Collins FRS FMedSci, Principal Investigator at UK Biobank. 

Game-changing data for health research 

Today’s addition of sequencing data comes after a series of great leaps made using the vast UK Biobank biomedical database.

These leaps include: finding genes associated with protection against obesity and type 2 diabetes, which has the potential to lead to the development of new drugs; identifying individuals at very high genetic risk for diseases such as heart disease, breast cancer and prostate cancerwhich may help with screening; and a link between activity and Parkinson’s that can predict the disease up to seven years before diagnosis from smartwatch data, potentially leading to early intervention. The new sequencing data will dramatically enhance the existing data’s potential. 

Whole genome sequencing data on this scale, combined with UK Biobank’s existing data and biological samples, is expected to lead to more targeted drug discovery and development, the discovery of thousands of disease-causing non-coding genetic variants, the acceleration of precision medicine and better understanding of the biological underpinnings of disease.  

“Researchers can now apply to access de-identified full genome data from half a million participants, alongside a rich combination of medical, biochemical, lifestyle and environmental data from volunteers involved,” said Professor Dame Ottoline Leyser DBE FRS, chief executive of UK Research and Innovation (UKRI).

“Today marks an important milestone in UKRI’s commitment to realise the potential of genetics for biomedical research, innovation and translation to the clinic.”   

Set up 20 years ago, the charity UK Biobank recruited half a million altruistic volunteers to create the world’s most comprehensive source of health data. It is used by researchers across the world, from academic, commercial, government and charitable settings, for scientific discoveries that improve human health.  

UK Biobank now provides the most detailed picture of human health that exists, equipping researchers with the ultimate toolbox to make previously out-of-reach links and discoveries about disease development possible. 

“The sheer amount of genetic data is exceptional – it is twice as much as anywhere else – but UK Biobank’s data is so illuminating because we’ve been able to follow the health of our brilliant volunteers for around 15 years,” said Professor Collins. 

More than 9,000 peer-reviewed papers from platform 

To date, over 30,000 researchers from more than 90 countries have registered to use UK Biobank, with over 9,000 peer-reviewed papers published as a result. Researchers are given the tools and computing power to analyse the de-identified data via UK Biobank’s secure, cloud-based Research Analysis Platform4.  

This project was funded by Wellcome, UKRI and four biopharmaceutical companies; Amgen, AstraZeneca, GSK and Johnson & Johnson5. This data – and the rest of UK Biobank’s de-identified data – is now globally accessible for approved researchers on the UK Biobank Research Analysis Platform which is hosted on Amazon Web Services (AWS) in the London region and enabled by DNAnexus.

This is the first time a globally accessible resource, the computing power, and necessary storage required to analyse this size and sort of data, has been made available to researchers. 

The four pharmaceutical companies plan to publicly share their summary statistical analyses arising from the consortium collaboration, including genome-wide association results, providing the research community with highly valuable insights without the costly and time-consuming burden of analysing raw data.