ChatGPT Health fails to flag over 50% of medical emergencies
- 27 February 2026
- A study found that ChatGPT Health under-triaged 52% of serious medical emergencies
- It over-reacted in lower-risk cases, with 64.8% of safe individuals incorrectly told to seek immediate medical care
- Suicide alerts sometimes triggered in lower-risk scenarios while failing to appear in higher-risk cases
ChatGPT Health incorrectly triages more than half of medical emergencies and frequently fails to detect suicidal ideation, a study has found.
OpenAI launched ChatGPT Health in January 2026, allowing users in the US to connect their medical records to receive health advice.
It is used for health advice by around 40 million US adults each day, according to figures from OpenAI.
But an independent safety evaluation, published in Nature Medicine on 23 February, found that the AI tool under-triaged 52% of ‘gold-standard emergencies’.
This included directing patients with diabetic ketoacidosis and impending respiratory failure to 24–48-hour evaluation rather than the emergency department.
Lead author of the study, Dr Ashwin Ramaswamy, said: “ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions.
“But it struggled in more nuanced situations where the danger is not immediately obvious, and those are often the cases where clinical judgment matters most.”
Researchers at the Icahn School of Medicine at Mount Sinai created 60 patient scenarios, from mild illness to medical emergencies.
Each case was reviewed by three independent doctors using established clinical guidelines, to see what level of care was needed.
The team then generated nearly 1,000 responses from the AI under varying conditions, including changes to patient gender, the addition of lab results and comments from family members, before comparing the system’s recommendations against doctors’ assessments.
In one simulation, it sent a suffocating woman to a future appointment she may not have survived in eight out of 10 (84%) of attempts.
However the AI tool over-reacted in lower-risk cases, with 64.8% of safe individuals incorrectly told to seek immediate medical care.
ChatGPT Health was designed to direct users to a suicide crisis line in high-risk situations, but researchers found that these alerts sometimes triggered in lower-risk scenarios while failing to appear when users described specific plans for self-harm.
“The system’s alerts were inverted relative to clinical risk, appearing more reliably for lower-risk scenarios than for cases when someone shared how they intended to hurt themselves.
“In real life, when someone talks about exactly how they would harm themselves, that’s a sign of more immediate and serious danger, not less,” the researchers said.
The study also found that when family or friends minimised symptoms, the majority of triage recommendations shifted toward less urgent care.
However, the research team said that the findings do not suggest consumers should abandon AI health tools.
Alvira Tyagi, a medical student and second author of the study, said: “These systems are changing quickly, so part of our training now must consider learning how to understand their outputs critically, identify where they fall short, and use them in ways that protect patients.”
Commenting on the study, Alex Ruani, a doctoral researcher in health misinformation mitigation at University College London, said that the findings are “unbelievably dangerous”.
“What worries me most is the false sense of security these systems create.
“If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life,” she said.
OpenAI told Digital Health News that the study does not reflect how people typically use ChatGPT Health or how the product is designed to function in real-world health scenarios.
