Unedited AI medical responses could put patients at risk

Artificial intelligence (AI) is increasingly being used in a wide range of medical applications, including the diagnosis of patients and the development of new drugs.

Another use is in communicating with patients and streamlining administrative tasks.

These tasks are suited to the type of generative AI known as large language models (LLMs), including the well-known ChatGPT systems.

Now, researchers from US not-for-profit integrated healthcare system Mass General Brigham have evaluated the benefits and potential risks and limitations of using LLMs in such a setting.

The research was prompted by concerns about the potential risks associated with integrating LLMs into messaging systems, based on the team’s real-world experience.

The resulting study, published in the Lancet Digital Health, found that LLMs could help to reduce the administrative burden on physicians, potentially reducing burnout.

LLMs could also improve patient education when used to draft replies to patient messages, but the researchers also noted some limitations that could affect patient safety.

They found that messages composed entirely by AI could potentially put patients at risk, meaning that human oversight needs to be kept in the loop with edits to the communications made when needed.

Researchers used OpenAI’s GPT-4 to respond to pre-generated user questions

The team used OpenAI’s current GPT-4 system, which was first used to generate 100 scenarios and related questions about symptoms for patients with cancer.

The scenario and question pairs were all reviewed and edited by a cancer specialist to make sure that they represented realistic clinical pictures.

GPT-4 was then separately prompted to produce a response to the simulated patient question.

Six radiation oncologists were also asked to respond to the messages as they normally would in a professional capacity, and to edit the AI responses (LLM drafts) to ensure that they were clinically acceptable responses to send to a patient.

They were not told whether the responses were made by AI or humans, and just under a third (31%) were misidentified as having been written by a human.

The AI-drafted responses tended to include more educational background for patients, but were ‘less directive’ with their instructions.

82.1% of AI responses were deemed ‘safe’ by the reviewers, while 58.3% were acceptable to be sent without further editing.

However, 7.1% could have posed a risk to the patient and 0.6% could have posed a risk of death, most often because the AI failed to instruct the patient to seek immediate medical care.

Today’s news was brought to you by TD SYNNEX – the UK’s number one solutions distributor.

Become a reseller with TD SYNNEX