Join the club for FREE to access the whole archive and other member benefits.

AI models for hospital care fail to detect critical conditions

New testing methods showed that AI struggles with real-world medical emergencies

11-Mar-2025

Key points from article :

Doctors in intensive care units rely on accurate predictions to catch when a patient's condition worsens. Machine learning models are designed to help with this, but a recent Virginia Tech study published in Communications Medicine found that these models fail to detect critical conditions in many cases. Specifically, models predicting in-hospital mortality missed 66 percent of injuries, making them unreliable for life-saving decisions.

Researchers at Virginia Tech investigated how well these models respond to medical emergencies. “Predictions are only valuable if they can accurately recognize critical patient conditions,” one researcher said, emphasizing the importance of timely alerts for doctors. However, their research found that many machine learning models struggle to detect life-threatening changes in a patient’s health.

To conduct the study, the research team collaborated with Ph.D. students and experts from the University of Arizona College of Medicine. They tested machine learning models on multiple datasets to assess their ability to predict serious medical conditions. Their results showed that patient data alone is not enough to train these systems effectively.

The team introduced new ways to test the models, including a neural activation map that highlights how well a model detects health deterioration. They also developed a gradient ascent method, which automatically generates test cases to evaluate model performance. “We systematically assessed machine learning models’ ability to respond to serious medical conditions using new test cases,” a researcher said. These tests showed that the models failed in several critical areas, including breast and lung cancer prognosis.

The researchers argue that training AI models solely on patient data creates dangerous blind spots. They suggest improving accuracy by introducing synthetic data and integrating medical expertise into model design. “A more fundamental design is to incorporate medical knowledge deeply into clinical machine learning models,” one researcher explained, pointing out the need for collaboration between computing and medical professionals.

The team is now expanding their testing to other AI models, including large language models, to see how well they perform in time-sensitive medical tasks like detecting sepsis. “AI safety testing is a race against time, as companies are pouring products into the medical space,” a researcher said. They stressed that rigorous and transparent testing is necessary to ensure AI tools in health care do more good than harm.

Mentioned in this article:

Click on resource name for more details.

Topics mentioned on this page:
AI Diagnostics, Health and Social Care
AI models for hospital care fail to detect critical conditions