AI testing platform removes commercial bias for NHS use
- 27 November 2025
- Researchers have developed a platform to determine whether commercial AI algorithms are fit for NHS use
- It used diabetic eye disease as the first example
- The study found that the platform removed biases that can come from companies wanting to deploy AI software in clinical settings
Researchers have developed a testing platform to determine whether commercial AI algorithms are fit for NHS use to detect disease fairly, using diabetic eye disease as the first example.
A study, published in The Lancet Digital Health on 24 November 2025, found that the platform removed biases that can come from companies wanting to deploy AI software in clinical settings, putting firms on a level playing field.
The platform was used to compare commercial AI algorithms designed to detect diabetic eye disease by identifying signs of blood vessel damage at the back of the eye.
Professor Alicja Rudnicka at City St George’s, University of London, who co-led the study at City St George’s, University of London, said: “Our revolutionary platform delivers the world’s first fair, equitable and transparent evaluation of AI systems to detect sight-threatening diabetic eye disease.
“This depth of AI scrutiny is far higher than that ever given to human performance. We’ve shown that these AI systems are safe for use in the NHS by using enormous data sets, and most importantly, showing that they work well across different ethnicities and age groups.”
The study was co-led by Adnan Tufail at Moorfields Eye Hospital NHS Foundation Trust, in collaboration with Kingston University and Homerton Healthcare NHS Trust.
A trusted research environment of independent researchers was built, with 25 companies with CE marked algorithms invited to take part in the study, of which eight accepted.
The performance of the eight algorithms was compared to images analysed by up to three humans who followed the standard protocol used in the NHS.
Vendor algorithms did not have access to human grading data and companies were excluded from the data ‘safe haven’ where the images were being analysed by their algorithms.
In total, 202,886 screening visits were evaluated, representing 1.2 million images from 32% white, 17% Black, and 39% South Asian ethnic groups.
The AI systems took 240 milliseconds to 45 seconds to analyse all images per patient, compared with up to 20 minutes for a trained human.
The accuracy across the AI algorithms to identify diabetic eye disease potentially in need of clinical intervention was 83.7-98.7%.
Accuracy was 96.7-99.8% for moderate-to-severe diabetic eye disease and 95.8-99.5% for the most advanced (proliferative) sight-threatening diabetic eye disease.
This compares to a previously published study where the accuracy of humans to manually grade images for these levels of diabetic eye disease ranged from 75% to 98%, showing that the AI algorithms performed the same as, or even better, than a human in a fraction of the time.
Professor Sarah Barman of Kingston University, who was involved in the study, said: “This large-scale evaluation of the effectiveness of AI algorithms has allowed us to demonstrate how different algorithms perform across subgroups of the population.
“It also provides a clear approach that can be applied to other medical domains to help ensure that AI is fair and works well for everyone.”
Meanwhile, in November 2024, NHS England announced plans to provide advanced eye scans in the community for people with diabetes, potentially saving up to 120,000 hospital appointments a year.
