the gullible, scattershot world of ai medicine
While medical AI is flourishing in labs, chatbots being used for medical help and advice are stumbling, and badly.
One of the biggest promises of AI is in improving the quality and lowering the cost of medicine. Forget trying to book an appointment with a doctor, you have one in your pocket, ready to help you any time, day or night. They will listen to your for as long as it takes and catch things a human doctor will miss. Which, yes, can be true in some cases, but not in others. And more often than not, it may just diagnose you with an illness you not only don't have, but one that doesn't even exist.
In fact, while AI can notably improve diagnostic accuracy and even help figure out rare conditions, it can also misdiagnose 80% of early stage conditions. How on just a minute though. How can AI both solve complex medical problems and help catch early stage cancer, but also fumble four in five initial diagnoses at the same time? Does this mean that AI-enabled medical breakthroughs are the exception, and we only hear about the lucky 20% of cases?
Well, not exactly. AI models used to help diagnose cancer and chatbots used by an average person are very different entities. One is specifically designed to work with very precise data within the context of a medical practice, the other is just a vector database able to tokenize natural human language. The abysmal 20% success rate is a reflection of these chatbots' efforts, and it happens for two reasons.
First, when you feed them standard vignettes used to train med school students, they lack the ability to reason and dive into the nuances of the problem, and a lot of medical issues come down to the nuances. Symptoms are similar across many illnesses, treatments can be similar as well, and its only specific tests that can nail down specific conditions, tests you know to do not by regressing to the mean, but by asking for clarification and thinking through all the possibilities.
Your runny nose is probably just a cold. But it may be a sign of the flu. Or you may be allergic to an animal, or a plant. Or it may just be irritated. Or you could have a cancerous growth in your sinuses. How do you narrow it down? Most diagnoses in doctors' offices are done by a process of elimination, not just plugging in patients' symptoms into a calculator. On top of that, there are symptoms patients miss, or don't know are symptoms, or lie about not having them out of embarrassment. A chatbot designed to flatter its users isn't going to pry and second guess.
Second, they have no value judgments when it comes to the information that they ingest. In a different study, doctors saw chatbots diagnosing users who had itchy, dry eyes with bixonimania, a skin condition that gives you a rash on your eyelids and dry, itchy eyes. It's a minor condition first documented in 2024 by Dr. Lazljiv Izgubljenovic, who, like bixonimania, doesn't actually exist. Two posts on Medium and two preprints on a LinkedIn for scientists, and that was more than enough for chatbots to accept it as fact.
The hoax was very obvious, and purposefully so. The preprints thank Star Trek's Starfleet Academy for the use of medical facilities on the USS Enterprise, and The Professor Sideshow Bob Foundation and the Fellowship of the Ring for funding the research. Even the Sokal affair designed to expose postmodernism was more subtle than this. Only the editors of Social Text at least asked if they were dealing with an honest, real paper. The chatbots? They swallowed this hook, line, and sucker.
So, if actual researchers trying to test chatbots can casually invent a new disease with a few blog posts, just imagine the unhinged nonsense Claude and ChatGPT will happily regurgitate should a motivated crank with a fake journal publish a dozen papers on the scourge of "moneyexcessavitis" which upper middle class, health-conscious, heavy social media users are apparently contracting at epidemic rates, and the only solution is to buy his special supplement course for $79.99 per month over the next two, to five, to eleven years.
I'd argue that this lack of critical thinking and healthy skepticism is what prevents AI assistants from being as useful as we hoped. In 1998, philosopher Nick Bostrom posited that given enough processing power and access to information, a super-intelligent AI was just a matter of time, laying down the foundations of today's cult like sentiments in Silicon Valley based on an influential but very misunderstood paper by computer scientist and sci-fi author Vernor Vinge. And as far back as the late 2000s, people like me have been warning that just accumulating and indexing data at high speed is not enough.
But while chatbots don't have the ability to think critically about the information coming our way, we do, even if we don't use it as much or as well as we should as often as we're supposed to. This is why when we ask a chatbot medical questions, we need to be extremely cautious of what it tells us. Because if the mental health advice it gives is any indication, we may end up making things worse should we blindly follow its lead and assume it knows what it's saying.
See: Rao A.S., Esmail K.P., et al. Large Language Model Performance and Clinical Reasoning Tasks. JAMA 2026;9(4) DOI: 10.1001/jamanetworkopen.2026.4003