What is it about?

During COVID-19, many AI models were built to read chest CT scans and flag possible infection, and they often reported high accuracy. But accuracy alone can hide a problem: a model may be "cheating," basing its decision on irrelevant marks in the image — stray letters, colored annotations, or the table the patient lies on — instead of the lungs. We used several explainable-AI (XAI) techniques (GradCAM, LIME, RISE, Squaregrid, and gradient-based methods) to produce visual heatmaps showing where each network actually "looks." Comparing popular architectures (VGG16, DenseNet, EfficientNet) on a public COVID CT dataset, we found that VGG16 leaned heavily on these spurious artifacts, while DenseNet was far more robust. Strikingly, accuracy differences of less than 1% could flip a model from fixating on a stray mark to correctly focusing on lung tissue.

Featured Image

Why is it important?

The work tackles a false sense of security: strong metrics (accuracy, F1, AUC all in the 80–90% range) can coexist with serious bias. To the authors' knowledge, this was the first study to compare this many XAI techniques side by side for bias detection in a real medical-image classifier, and to do so on the less-studied CT-scan case rather than chest X-rays. Three takeaways stand out: architecture choice affects robustness to bias; tiny accuracy gains can correspond to large, qualitative changes in what a network learns; and no single XAI method tells the whole story — using several in tandem probes different facets of the model. The practical message is that XAI heatmaps belong on the front line of validating medical-imaging AI, not as an afterthought.

Perspectives

What draws me to this work is the gap between a model that looks accurate and a model that is trustworthy. A 90% AUC is reassuring; asking what the network actually learned is harder and more important. Watching a classifier confidently latch onto a stray red letter instead of lung tissue was a vivid reminder that performance metrics alone are never enough for medical AI. For me it reinforced a conviction that explainability must be a routine part of validating diagnostic models — a direction I continue to pursue in digital pathology and other clinical imaging domains.

Prof. Dr. Eduardo Costa da Silva
Pontificia Universidade Catolica do Rio de Janeiro

Read the Original

This page is a summary of: Explainable Artificial Intelligence for Bias Detection in COVID CT-Scan Classifiers, Sensors, August 2021, MDPI AG,
DOI: 10.3390/s21165657.
You can read the full text:

Read

Contributors

The following have contributed to this page