Explainable Artificial Intelligence for Bias Detection in COVID CT-Scan Classifiers

Iam Palatnik de Sousa; Marley M. B. R. Vellasco; Eduardo Costa da Silva

doi:10.3390/s21165657

What is it about?

During COVID-19, many AI models were built to read chest CT scans and flag possible infection, and they often reported high accuracy. But accuracy alone can hide a problem: a model may be "cheating," basing its decision on irrelevant marks in the image — stray letters, colored annotations, or the table the patient lies on — instead of the lungs. We used several explainable-AI (XAI) techniques (GradCAM, LIME, RISE, Squaregrid, and gradient-based methods) to produce visual heatmaps showing where each network actually "looks." Comparing popular architectures (VGG16, DenseNet, EfficientNet) on a public COVID CT dataset, we found that VGG16 leaned heavily on these spurious artifacts, while DenseNet was far more robust. Strikingly, accuracy differences of less than 1% could flip a model from fixating on a stray mark to correctly focusing on lung tissue.

Photo by Umanoide on Unsplash

Why is it important?

The work tackles a false sense of security: strong metrics (accuracy, F1, AUC all in the 80–90% range) can coexist with serious bias. To the authors' knowledge, this was the first study to compare this many XAI techniques side by side for bias detection in a real medical-image classifier, and to do so on the less-studied CT-scan case rather than chest X-rays. Three takeaways stand out: architecture choice affects robustness to bias; tiny accuracy gains can correspond to large, qualitative changes in what a network learns; and no single XAI method tells the whole story — using several in tandem probes different facets of the model. The practical message is that XAI heatmaps belong on the front line of validating medical-imaging AI, not as an afterthought.

Perspectives

What draws me to this work is the gap between a model that looks accurate and a model that is trustworthy. A 90% AUC is reassuring; asking what the network actually learned is harder and more important. Watching a classifier confidently latch onto a stray red letter instead of lung tissue was a vivid reminder that performance metrics alone are never enough for medical AI. For me it reinforced a conviction that explainability must be a routine part of validating diagnostic models — a direction I continue to pursue in digital pathology and other clinical imaging domains.
Prof. Dr. Eduardo Costa da Silva
Pontificia Universidade Catolica do Rio de Janeiro

This page is a summary of: Explainable Artificial Intelligence for Bias Detection in COVID CT-Scan Classifiers, Sensors, August 2021, MDPI AG,
DOI: 10.3390/s21165657.
You can read the full text:

Read

Contributors

The following have contributed to this page

Prof. Dr. Eduardo Costa da Silva
Pontificia Universidade Catolica do Rio de Janeiro

Does AI read lungs or labels? Explainable AI reveals hidden bias in COVID CT scanners

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Does AI read lungs or labels? Explainable AI reveals hidden bias in COVID CT scanners

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management