"Dr. Google" had its problems. Can chat GPT health work better?

Some doctors see the LLM as an honor for medical literacy. The average patient can struggle to navigate the vast landscape of medical information online. Mark Soucy, an associate professor at Harvard Medical School and a practicing radiologist, says treating patients who search for their symptoms on Google “requires a lot of attacking the patient’s anxiety.” But now, he says, “you see college-educated, high-school-educated patients, and ask questions at the beginning med student level.”

Chat GPT Health’s release, and Anthropic’s A subsequent announcement Among the new health integrations for Cloud, it is indicated that AI giants are increasingly willing to recognize and encourage health-related uses of their models. Such uses certainly come with risks, given the well-documented tendency of LLMS users to create information rather than agree and admit ignorance.

But these risks also have to be weighed against the potential benefits. There’s an analogy with autonomous vehicles: When policymakers consider whether to allow Waymo in their city, the key metric is not whether its cars are never involved in accidents but whether they cause less harm than relying on human drivers. If Dr. Chat GPT is an improvement over Dr. Google — and early evidence suggests it might be — it could reduce the enormous burden of medical misinformation and unnecessary health anxiety the Internet has created.

Determining the effectiveness of a chatbot such as ChatGPT or Cloud for consumer health, however, is difficult. “It’s very difficult to evaluate an open chatbot,” says Daniel Butterman, clinical lead for data science and AI at Mass General Brigham Healthcare System. Major language models Score well on medical licensing exams, but those exams use multiple-choice questions that don’t reflect how people use chatbots to find medical information.

Suresha Rambhatla, assistant professor of management science and engineering at the University of Waterloo, tried to close that gap Assessing how GPT-4O responded To license exam questions when the list of possible answers is not accessible. Only half of the clinicians who evaluated the response rated the score as completely accurate. But the multiple-choice exam questions are designed to be so difficult that the answer options don’t completely eliminate them, and they’re still a far cry from something a user would type into ChatGPT.

a Different studieswho tested the GPT-4O on more realistic cues presented by human volunteers, found that it answered medical questions correctly 85% of the time. When I spoke with Amulya Yadav, an associate professor at Pennsylvania State University who runs the Responsible AI for Social Freedom Lab and led the study, she made it clear that she was not personally a fan of patient-facing medical LL.M. But he freely admits that, technically, they look up to the task — after all, he says, human doctors misdiagnose patients 10% to 15% of the time. “If I look at it dispassionately, it looks like the world is going to change, whether I like it or not,” he says.

For seeking medical information online, says Yadav, LLMS seems to be a better choice than Google. Susie, a radiologist, also concluded that an LLMS might be a better alternative to web searches when she Compare the response to GPT-4 For questions about common chronic medical conditions with information presented in Google’s Knowledge Panel, the information box that sometimes appears to the right of search results.

Since Yadav and Susi’s study was published online, in the first half of 2025, OpenAI has released several new versions of GPT, and it is reasonable to expect that GPT-5.2 will perform even better than its predecessor. But the studies have significant limitations: They focus on straightforward, factual questions, and they examine only brief interactions between users and chatbots or web search tools. Some of LLM’s weaknesses—particularly his tendency toward sycophancy and delusion—are more likely to rear their heads in a more widespread conversation and with those who are dealing with more complex problems. Professor Reva Lederman of the University of Melbourne, who studies technology and health, notes that patients who receive a doctor’s opinion get another opinion from an LLM – and an LLM, if it’s sycophantic, might encourage them to reject their doctor’s advice.

Some studies have found that LLMs will mimic and exhibit saccophony in response to health cues. For example, A study It turns out that GPT-4 and GPT-4O will happily accept and run with incorrect drug information included in a user’s query. i Another onethe GPT-4O has developed multiple definitions for fake syndrome and lab tests that have been used in consumer indications. Given the abundance of medically dubious diagnoses and treatments floating around the Internet, these patterns of LLM behavior may contribute to the spread of medical misinformation, especially if people perceive the LLM as trustworthy.

Openei reports that the GPT-5 series of models are markedly less sycophantic and prone to delusions than their predecessors, so the results of these studies may not apply to cheat GPT health. The company also evaluated the model using its publicly available Heathbench benchmark on its responses to health questions asked by ChatGuptHealth. The Health Bench rewards models that express uncertainty when appropriate, advise consumers to seek medical attention when needed, and avoid creating unnecessary stress for consumers by telling them their condition is more serious than it really is. It’s reasonable to assume that the model underlying ChatGPT’s health has exhibited these behaviors in testing, although Butterman notes that some of the indicators in the health benchmark were generated by LLMS, not users, which may limit how well the benchmark translates to the real world.

Editor's pick

Get latest news

“Dr. Google” had its problems. Can chat GPT health work better?

Open Notebook: A True Open Source Private Notebook LM Alternative?

Learn the fundamentals of Raga and MCP

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news