Study: AI chatbots may give unsafe medical advice

A large study examining the use of AI chatbots for medical advice found that people using large language models did not make better health decisions than those relying on traditional sources and may be exposed to inaccurate and inconsistent guidance.

The randomized study, conducted by researchers at the University of Oxford, involved nearly 1,300 participants who were asked to assess medical scenarios and decide on appropriate next steps, such as seeing a general practitioner or going to a hospital. Participants who used large language models, or LLMs, performed no better than those who relied on online searches or their own judgment, according to a Feb. 9 news release on the findings.

Researchers said the results highlight a gap between the strong performance of LLMs on standardized medical knowledge tests and their reliability when used by people seeking advice about personal health concerns.

The study identified several challenges that affected decision-making. Participants often did not know what information to provide to the models, while the models produced different answers to similar questions and frequently mixed accurate guidance with poor recommendations. As a result, users struggled to identify the safest course of action.

The scenarios used in the study were developed by physicians and reflected common but potentially serious situations, including a severe headache after a night out and persistent breathlessness in a new mother. Researchers evaluated whether participants correctly identified likely medical issues and chose appropriate actions, such as visiting a GP or going to an emergency department.

The study also found that current evaluation methods for large language models fail to capture the complexity of real-world use. Models that performed well on benchmark tests often faltered when interacting with human users, researchers said.

Lead author Andrew Bean, a doctoral researcher at the Oxford Internet Institute, said in the release that the findings point to the need for more rigorous testing before AI systems are deployed for public use.

Senior author Adam Mahdi, an associate professor at the Oxford Internet Institute, said reliance on standardized testing alone is insufficient to determine whether AI tools are safe in high-stakes settings such as healthcare.

Study: AI chatbots may give unsafe medical advice

Like this:

Leave a Reply Cancel reply

Obamacare Sign-Ups Drop, but the Extent Won’t Be Clear for Months

US Cancer Institute Studying Ivermectin’s ‘Ability To Kill Cancer Cells’

La consulta con tu próximo médico de atención primaria podría ser solo virtual y agendada a través de IA

Trabajadores de salud pública renuncian antes de ir a Guantánamo

Obamacare: el impacto de los costos en las inscripciones no se conocerá hasta dentro de varios meses