Why AI Can't Provide the Right Diagnosis Despite Knowing More Than Doctors

Why AI Can't Provide the Right Diagnosis Despite Knowing More Than Doctors

Artificial intelligence chatbots have demonstrated impressive capabilities, even outperforming many doctors in medical examinations. However, when it comes to real-world applications, these systems often fall short, leaving users with mixed results when trying to identify their symptoms. A comprehensive study published in Nature Medicine reveals that the disconnect lies not in the AI's medical knowledge but in the way people interact with these systems.

Conducted by researchers from the University of Oxford alongside MLCommons and other institutions, the study involved nearly 1,300 participants who were assigned descriptions of ten common medical scenarios. Some participants used AI chatbots like GPT-4o, Llama 3, and Command R+, while others relied on traditional sources for information, forming a control group. After their interactions with the bots, participants were asked to identify possible illnesses and where to seek help. Surprisingly, while the chatbots achieved an impressive accuracy of 94.9% in identifying conditions when tested alone, this accuracy plummeted to less than 34.5% when users engaged with them. In fact, those using chatbots performed no better than the control group that did not use them at all.

This paradox highlights a crucial issue: while chatbots can excel in structured environments like exams, they struggle in real-life conversations with everyday users. This complexity is further compounded by instances where AI systems like ChatGPT have successfully diagnosed conditions that stumped medical professionals, leading to increased public faith in "chat medicine."

The study's findings indicate that while language models excel at answering exam questions, they falter in the nuanced exchange of a doctor-patient consultation. AI models receive clearly defined questions with all necessary information during exams, but in reality, patients often provide incomplete details and misinterpret the AI's responses. The lack of effective communication between humans and machines is a significant barrier.

For instance, patients may omit crucial symptoms, failing to realize their significance, or they might misinterpret the AI's suggestions. Experts argue that chatbots should be equipped to ask clarifying questions similar to how doctors do. The responsibility of conveying complete information should not solely rest on the users, as the AI must also guide the conversation effectively.

Doctors, in contrast, utilize a wealth of training and experience to understand and empathize with their patients, employing the Calgary-Cambridge model, which emphasizes building trust, gathering information through precise inquiries, and collaboratively deciding on treatment plans. These skills are essential, as medical consultation is as much an art as it is a science.

Despite these challenges, AI has its place in healthcare. A growing number of American adults, approximately one in six, are turning to AI chatbots for medical information at least once a month. Major companies like OpenAI and Anthropic are developing specialized medical versions of their chatbots, which may yield different outcomes in future studies. However, for now, these tools are not ready for seamless integration into patient care.

The key takeaway from this research is the stark contrast between achieving high scores in tests and providing effective assistance to real patients. Just as passing a theoretical driving test does not guarantee a person will be a competent driver, excellent performance on medical assessments does not translate to reliable diagnostics. Empathy, adaptability, and the ability to navigate the complexities of patient communication remain firmly in the human domain, underscoring the ongoing need for skilled medical professionals. This finding emphasizes potential challenges for competitors in the health tech space as they strive to improve AI-human interactions in medical settings.

Informational material. 18+.

" content="b3bec31a494fc878" />