What is it about?

AttBalance is a loss function for specialist models in Visual Grounding, to better align the vision and language feature. It focuses the attention value within the ground truth bounding boxes, while use the attention map from momentum version model to retify the regulation.

Featured Image

Why is it important?

It can better align the visual and language features, resulting in a better performance in Visual Grounding.

Perspectives

Writing this paper is very easy as we have a comprehensive analysis leading to a clear motivation.

Weitai Kang
University of Illinois at Chicago

Read the Original

This page is a summary of: Visual Grounding with Attention-Driven Constraint Balancing, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3746027.3755338.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page