What is it about?

What this research is about Humans are very good at noticing when something important is missing from a scene. We can tell when an object should be there but is not, when a person is absent from a social situation, or when a story in a sequence of images feels incomplete. Modern vision language models are usually tested on what they see. In this study we asked whether they can also understand what is not shown. What we did We created a small but carefully designed set of images and short image sequences that each contain a meaningful absence. Some show missing physical objects, some show gaps in social situations, and some stop just before an event that viewers would naturally expect. Humans and the model GPT 4.1 received the same questions about each case. Their answers were compared to ideal explanations using semantic similarity methods. What we found The model was very strong in social scenes where patterns are stable and easy to compare. It often matched or exceeded the average human answer in these cases. Humans performed better when the missing element was small, subtle, or hidden among many visual details. Both humans and the model found temporal scenes more challenging since these require narrative understanding and sensitivity to emotional cues. The model in particular had difficulty when the key clue was small but essential for the story. Why it matters Understanding what is missing in a scene is an important part of human perception and social reasoning. Our study shows that current models are beginning to approach this ability in structured social cases but still fall short in situations that require deeper narrative insight or sensitivity to small but meaningful details. This work provides a first benchmark and helps guide future research on more human aligned visual reasoning.

Featured Image

Why is it important?

Artificial intelligence is becoming part of everyday tools, yet we know little about how well these systems understand what is missing from a scene. This ability is central to human perception and social understanding. By showing where AI performs well and where it still fails, our study gives a clearer picture of how ready these technologies are for real world use. It also provides a new set of test cases that others can build on to improve visual reasoning in AI. This makes the work timely for anyone who wants safer, more reliable, and more human aware AI systems.

Read the Original

This page is a summary of: Missing Pieces: How Humans and GPT-4.1 Detect Absence and Predict the Unseen, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3746276.3760466.
You can read the full text:

Read

Contributors

The following have contributed to this page