What is it about?
During labor and delivery, doctors must rapidly assess a large number of risk factors to determine whether a baby is at risk for serious complications such as cerebral palsy, low Apgar scores, or dangerous drops in umbilical cord pH. Making these judgments quickly and accurately is challenging, and existing AI tools often provide predictions without any explanation of their reasoning. This paper presents AIMEN (Artificial Intelligence for Modeling and Explaining Neonatal Health), a deep learning system that predicts the risk of adverse labor outcomes using 34 clinical risk factors spanning maternal health, fetal conditions, obstetrical events, and delivery characteristics. What makes AIMEN distinctive is that it does not just flag a case as high-risk — it also generates "what-if" explanations called counterfactual examples. These show clinicians which specific factors, if changed, would shift the prediction from abnormal to normal. On average, only 2 to 3 factor changes are needed to flip a prediction, making the explanations concise and actionable. To overcome the challenge of small and heavily imbalanced clinical datasets (only 112 abnormal cases out of 1,457 total), AIMEN uses a Conditional Tabular GAN (CTGAN) to generate synthetic training data. An ensemble of 8 neural networks is then trained on this augmented data, with predictions combined through weighted voting. AIMEN outperforms established models including XGBoost, LightGBM, TabNet, and DANet, achieving a macro average F1 score of 0.784 on real, unseen test data.
Featured Image
Photo by Mohammad Hossein Farahzadi on Unsplash
Why is it important?
Adverse labor outcomes such as cerebral palsy are lifelong and often preventable with timely intervention. Electronic fetal monitoring has been the standard clinical tool for over 50 years, yet it is well established that many other risk factors contribute to poor outcomes, and no reliable automated system currently integrates them all for real-time decision support. AIMEN addresses this gap in three ways that are timely and clinically significant. First, it combines predictive accuracy with interpretability — a combination that is increasingly required by regulators and clinical stakeholders as AI moves into high-stakes medical settings. Second, it tackles the practical reality of small and imbalanced clinical datasets through principled synthetic data generation, providing a reusable methodological framework for other rare-outcome prediction problems in medicine. Third, its counterfactual explanations are sparse enough (averaging 2.5 feature changes) to be actionable, matching human preferences for explainable AI identified in prior user studies. The work was developed in close collaboration with obstetricians, directly responding to documented physician skepticism about AI in labor and delivery. It lays the groundwork for a clinically deployable, transparent, and trustworthy decision-support tool that could reduce preventable birth injuries and associated medical liability.
Read the Original
This page is a summary of: Use of What-if Scenarios to Help Explain Artificial Intelligence Models for Neonatal Health, ACM Transactions on Computing for Healthcare, May 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3814951.
You can read the full text:
Contributors
The following have contributed to this page







