What is it about?
COVID‑19 has shown how difficult it can be for doctors to quickly understand which patients are at higher risk of developing severe conditions. Hospitals routinely collect many different types of information during a patient’s stay, such as clinical notes, laboratory results, and chest X‑ray images. However, these data are often analyzed separately, making it harder to capture the full picture of a patient’s situation. In this study, we developed a new artificial intelligence (AI) method that combines two types of information at the same time: the sequence of clinical events recorded during a patient’s hospitalization and the chest X‑ray images taken along the way. By integrating these two data sources, the AI can better predict whether a patient with COVID‑19 is likely to recover and go home or is at risk of dying. The system also provides explanations, showing which parts of the X‑ray or which pieces of clinical information influenced its prediction. We tested this approach on a large real‑world dataset of hospitalized COVID‑19 patients and found that using both text and images together leads to more accurate predictions than analyzing either type of information alone. This kind of multimodal AI could help doctors make earlier and better‑informed decisions, supporting patient care while keeping human judgment at the center of the process.
Featured Image
Photo by julien Tromeur on Unsplash
Why is it important?
What makes this work unique is its ability to combine, for the first time within Process Mining research, two very different types of clinical information—textual narratives of a patient’s clinical pathway and chest X‑ray images—into a single predictive model. Previous approaches typically analyzed these data sources in isolation, losing important relationships between how a patient’s condition evolves and what their medical images show. Our study introduces a multimodal AI system that processes both modalities jointly, enabling more complete and accurate predictions of COVID‑19 mortality risk than traditional methods. The work is also timely. Healthcare systems are increasingly digitized, and large volumes of electronic health records and medical images are being collected every day. Yet, current predictive tools still struggle to use this rich information effectively. Our approach leverages a state‑of‑the‑art multimodal foundation model (FLAVA), adapted to healthcare for the first time, to learn from the real sequence of events that make up a patient’s clinical pathway—not just static summaries. This allows the model to capture early warning patterns in patients with COVID‑19, a disease where timely decisions can save lives. Finally, unlike many AI systems that operate as “black boxes,” our method provides clear explanations of its predictions, showing which parts of the X‑ray or which clinical events influenced the outcome. This transparency is essential for building trust in AI‑assisted decision-making, especially in high‑stakes environments like hospital care. By improving predictive accuracy while keeping clinicians in the loop, this work can help support earlier interventions, more informed decisions, and better patient outcomes.
Perspectives
What I find most inspiring in this work is the opportunity to explore how multimodal AI can be used not only to improve predictive accuracy but also to provide explanations that clinicians can actually interpret. Throughout the work, we authors kept returning to the same question: How can we design AI that strengthens, rather than replaces, human judgment? This article is one attempt to push in that direction. I also see this publication as a step toward a broader vision: using AI responsibly to support healthcare systems that are under growing pressure. The pandemic has shown us the limits of current tools, but it has also created a space to rethink how data, technology, and clinical expertise can come together. I hope that this study encourages further exploration of AI models that are not only powerful but also transparent and respectful of the complexity of real medical decisions.
Prof. Donato Malerba
Universita degli Studi di Bari Aldo Moro
Read the Original
This page is a summary of: Multimodal predictive process monitoring and its application to explainable clinical pathways, Information Systems, July 2026, Elsevier,
DOI: 10.1016/j.is.2026.102698.
You can read the full text:
Contributors
The following have contributed to this page







