What is it about?

When evaluating Artificial Intelligence Explanations, saliency maps help show what parts of the data are most important in making a decision. These heat maps are then compared to a known ground truth to find the overlap. The more overlap the better. We propose asking users to select areas that they find most important to define graduated maps that humans pay the most attention to. Our benchmark validates that these graduated maps capture different information than the ground truth baseline.

Featured Image

Why is it important?

We demonstrate how to capture crowdsourced attention in text and image domains. We confirm that these maps contain different information than typical pixel-wise ground truth baselines and also show how they can be used to extract and examine human biases in a dataset.


This paper's message may be challenging to distill with the usage of terms like "multilayer human-attention benchmark" and “single-layer ground truth mask", but it still suggests and validates a seemingly intuitive deductive approach for validating Explainable Artificial Intelligence (XAI) tools.

Jeremy Block
University of Florida

Read the Original

This page is a summary of: Quantitative Evaluation of Machine Learning Explanations: A Human-Grounded Benchmark, April 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3397481.3450689.
You can read the full text:




The following have contributed to this page