Key points from article :
Researchers have highlighted a critical issue with OpenAI's Whisper AI transcription tool, used in medical settings for converting speech to text. Though Whisper supports medical transcriptions across hospitals, it occasionally "hallucinates" or invents passages, particularly when it encounters moments of silence.
These hallucinations include made-up sentences or phrases that don’t align with the context. Hospitals widely use Whisper, integrated by a company called Nabla, with over 30,000 clinicians and 40 health systems relying on it to manage patient conversations. In total, Whisper has processed approximately 7 million medical conversations, according to a report by ABC News.
A study led by researchers from Cornell University and the University of Washington examined Whisper’s tendencies to hallucinate during silences, especially noticeable when transcribing conversations involving people with aphasia—a language disorder marked by pauses in speech. The team collected samples from TalkBank’s AphasiaBank and found that Whisper generated unrelated sentences, sometimes with violent or nonsensical content. In one notable instance, Whisper produced phrases like "Thank you for watching!" likely due to its training on vast amounts of YouTube videos, which OpenAI reportedly used to enhance its transcription accuracy.
The researchers presented these findings at the Association for Computing Machinery’s FAccT conference in Brazil. Nabla acknowledged the problem and is reportedly working to address these hallucinations.
OpenAI’s spokesperson stated that they are committed to reducing hallucinations and improving the model, noting that their usage policies advise against using Whisper in critical decision-making contexts, given its limitations. The findings spotlight the challenges AI models face in high-stakes environments, underscoring the need for cautious implementation, especially in healthcare where accuracy is paramount.