ECRI names AI chatbot misuse as top health technology hazard for 2026

Tombstone icon

Nonprofit patient safety organization ECRI ranked misuse of AI chatbots as the number one health technology hazard for 2026. ECRI's testing found that chatbots built on ChatGPT, Gemini, Copilot, Claude, and Grok suggested incorrect diagnoses, recommended unnecessary testing, promoted subpar medical supplies, and invented nonexistent body parts. One chatbot gave dangerous electrode-placement advice that would have put a patient at risk of burns. OpenAI reported that over 5 percent of all ChatGPT messages are healthcare related, with 200 million users asking health questions weekly, despite the tools not being validated or approved for healthcare use.

Incident Details

Severity:Catastrophic
Company:OpenAI, Google, Microsoft, Anthropic, xAI
Perpetrator:AI chatbot
Incident Date:
Blast Radius:200 million weekly ChatGPT health users; clinicians, patients, and hospital staff using unvalidated AI chatbots for medical decisions
Advertisement

The Organization Hospitals Trust Most Issues Its Warning

ECRI has been publishing its annual Top 10 Health Technology Hazards list since 2008. The independent, nonpartisan patient safety organization evaluates medical technologies, investigates safety incidents, and advises hospitals and health systems on what to watch out for. When ECRI ranks a technology as a top hazard, healthcare organizations pay attention - that's the point of the list.

For 2026, ECRI placed the misuse of AI chatbots in healthcare at the very top, ahead of unpreparedness for "digital darkness" events (sudden loss of access to electronic health records), substandard and falsified medical products, recall communication failures for home diabetes devices, and cybersecurity risks from legacy medical equipment. AI had appeared on ECRI's hazard list before - insufficient governance of AI in medical technologies placed fifth in 2024, and AI risks topped the 2025 list - but the 2026 report sharpened its focus specifically on general-purpose chatbots being used in clinical contexts they were never built for.

ECRI's principal project officer for device safety, Rob Schluth, noted during a webinar that the year's number one hazard isn't even a medical device. ECRI deliberately tested only commonly available chatbots based on large language models - ChatGPT, Claude, Copilot, Gemini, and Grok - the same ones used by patients browsing their phones at 2 AM with a weird symptom and clinicians looking for a quick reference between patients.

What ECRI Found When It Tested the Chatbots

ECRI's patient safety experts ran the major chatbots through medical scenarios and documented the results. The findings were specific and concrete.

The chatbots suggested incorrect diagnoses. They recommended unnecessary testing. They promoted subpar medical supplies. And they invented body parts - generated confident-sounding medical terminology for anatomical structures that do not exist - while responding to medical questions with the authoritative tone of a seasoned specialist.

One test stood out. ECRI asked a chatbot whether it would be acceptable to place an electrosurgical return electrode over the patient's shoulder blade. Electrosurgical return electrodes (often called grounding pads) are used during surgery to safely conduct electrical current away from the patient. Their placement matters: put the pad in the wrong location and the current can concentrate in a small area, causing burns. The chatbot incorrectly stated that placing the electrode on the shoulder blade was appropriate. If a clinician had followed that advice, the patient would have been at risk of a surgical burn.

This failure is particularly informative because it illustrates how the chatbot's error pattern works. The chatbot did not say "I don't know" or flag uncertainty. It gave a clear, affirmative answer - the kind of answer a busy clinician in the middle of a procedure might accept without a second thought, because it sounded like the answer a knowledgeable colleague would give. The confidence was indistinguishable from expertise, and the advice was wrong.

The Scale of the Problem

The numbers behind ECRI's concern are hard to argue with. OpenAI reported in January 2026 that more than 5% of all messages sent to ChatGPT are about healthcare. With 800 million regular users, that means one quarter of ChatGPT's user base - roughly 200 million people - asks healthcare questions every week. OpenAI also cited a figure of more than 40 million people using ChatGPT daily for health information.

None of these users are interacting with a medical device. ChatGPT, Claude, Gemini, Copilot, and Grok are general-purpose language models. They have not been validated for healthcare use. They have not been approved or cleared by the FDA as medical devices. They carry disclaimers saying as much. And 200 million people a week use them for health questions anyway.

The gap between how these tools are designed to function and how they are actually used is the core of ECRI's concern. A chatbot trained to predict the next plausible word in a sequence will say "yes, that placement is appropriate" if "yes" is the statistically likely response given the phrasing of the question. It does not run the answer through anatomical knowledge, check it against surgical safety guidelines, or consider whether the response could physically harm someone. It generates text that sounds like an expert answer.

Why People Trust the Wrong Tool

ECRI president and CEO Marcus Schabacker, who holds both an MD and a PhD, put it directly: "When something seems helpful and definitive, people begin to rely on it without question. That's the problem."

The human tendency to trust confident-sounding answers is well-documented in psychology, and it's especially dangerous in a medical context. A patient Googling a symptom has always faced the risk of misinterpreting what they find, but search results at least present multiple sources that can be compared. A chatbot delivers a single, fluent, authoritative-sounding answer. There is no second opinion on the screen. The interface is designed to feel like a conversation with a knowledgeable entity, and the entity never hedges, never says "actually, I'm a text prediction engine and you should talk to a doctor about this."

Clinicians are not immune to this. ECRI found that healthcare professionals are increasingly using general-purpose chatbots for clinical reference, quick lookups, and decision support. A doctor who asks a chatbot for a medication interaction check and receives a clear, confident answer may not realize that the answer was generated by pattern-matching rather than pharmacological analysis. The output looks the same either way.

The Bias Dimension

ECRI's report also flagged that AI chatbots can exacerbate existing health disparities. Large language models are trained on datasets that reflect historical biases in medical literature and practice. If the training data underrepresents certain populations or encodes biased treatment patterns, the chatbot's responses will reflect those biases.

This is a problem that predates AI chatbots - biased treatment patterns in human medicine are well-documented - but chatbots can amplify the issue in two ways. First, they deliver biased information with the same confident tone they use for everything else, making it harder for the user to recognize when advice may be based on skewed data. Second, they scale the distribution of that biased information to millions of interactions per day, turning individual clinical biases into population-wide exposure.

The Access Trap

ECRI identified a compounding factor that makes the chatbot hazard especially difficult to address: rising healthcare costs and hospital closures are pushing more people toward AI chatbots as substitutes for professional medical care. When a clinic closes in a rural area, or when a patient can't afford a specialist visit, the free chatbot on their phone becomes the path of least resistance.

This creates a situation where the people most likely to rely on AI chatbots for health decisions are often those with the fewest alternatives and the least ability to absorb the consequences of bad advice. A patient with good insurance and a primary care physician might use ChatGPT to research a symptom before their appointment. A patient without either might use ChatGPT instead of the appointment entirely.

ECRI did not frame this as a reason to ban chatbots - "ECRI is not claiming that AI chatbots are inherently dangerous," the organization stated - but as a reason to understand the risk clearly. The tools can be useful for education and general information. The danger emerges when they are treated as substitutes for medical judgment, whether by patients who have no other option or by clinicians who find them convenient.

What ECRI Recommended

ECRI's recommendations were practical: recognize the limitations of AI chatbots, use them as educational resources rather than decision-making aids, and keep a human "in the loop" by confirming important information with trusted sources and qualified professionals. The organization's full report, available to ECRI members, includes detailed steps for healthcare organizations to reduce AI chatbot-related risks.

ECRI also held a live webcast on January 28, 2026, titled "The Misuse of AI Chatbots in Healthcare: Risks, Realities, and Responsible Innovation," featuring discussion from ECRI's patient safety experts on the specific dangers their testing had uncovered.

Schabacker's framing captured the core tension: "Medicine is a fundamentally human endeavor. While chatbots are powerful tools, the algorithms cannot replace the expertise, education, and experience of medical professionals. Realizing AI's promise while protecting people requires disciplined oversight, detailed guidelines, and a clear-eyed understanding of AI's limitations."

The Irony of Timing

ECRI published its hazard list at almost exactly the same moment that OpenAI was preparing to launch ChatGPT Health, a dedicated healthcare experience built into its chatbot. OpenAI's own data showed 200 million people a week already asking ChatGPT health questions, making a formal health product a commercial inevitability. The patient safety organization most trusted by hospitals was warning that unvalidated chatbots are the number one health technology hazard, while the company behind the most popular chatbot was building a health-specific product to meet the demand those warnings described.

Whether that product would address the specific failures ECRI documented - the invented body parts, the incorrect electrode placement advice, the wrong diagnoses - remained to be seen. ECRI notably excluded purpose-built health applications like ChatGPT Health and Open Evidence from its testing, focusing only on the general-purpose chatbots that most people actually use. The gap between what specialized health AI tools could offer and what 200 million people were already doing with general-purpose chatbots was the hazard ECRI was flagging - not a theoretical future risk, but a present-tense reality measured in hundreds of millions of medical questions answered by tools that were never designed to answer them.

Discussion