What is it about?
Foundation models like CLIP and LLaVA can automatically describe images and assign labels to help train machine learning models. This ability is critical in large-scale tasks like object recognition or segmentation, where manually labeling data would be extremely time-consuming. However, these automatically generated labels are often noisy, misaligned, or inaccurate—and poor labels can lead to unreliable AI systems. Our work introduces VISTA, a human-in-the-loop visual analytics system that helps people detect, explore, and fix issues in these foundation model-generated labels. Instead of relying purely on automated tools or small hand-labeled samples, VISTA provides interactive visual summaries and quantification tools to help humans validate data quality more effectively at scale. It does this by grouping related image-label pairs, measuring how well labels match the content, and guiding users to identify patterns of problems—like irrelevant tags or mismatches between labels and object locations. VISTA combines the strengths of machine intelligence (to organize and measure) and human judgment (to interpret and refine), resulting in better training data and more trustworthy AI models.
Featured Image
Photo by ev on Unsplash
Why is it important?
As AI models get larger and more powerful, the quality of their training data becomes even more critical. Foundation models can generate labels for huge datasets, but without human oversight, those labels might be inconsistent, biased, or misleading. Unfortunately, validating data at this scale is difficult and expensive. VISTA is one of the first systems to provide a scalable, visually guided solution for quality-checking foundation model outputs. It bridges the gap between automated label generation and high-quality human curation—without requiring labor-intensive manual annotation for every sample. This is especially timely as the community shifts toward data-centric AI, where the focus is no longer just on better models, but better data. Our results show that VISTA improves model performance and helps researchers quickly identify recurring problems. It sets a new standard for how visual tools can support AI data workflows in domains like autonomous driving, healthcare, and robotics—where label quality can have real-world consequences.
Perspectives
Working on VISTA has been a powerful reminder of how important human insight remains in the age of powerful AI. Foundation models have impressive capabilities, but their outputs are not infallible. Through this project, I wanted to create a system that empowers people to collaborate with AI—bringing intuition, domain expertise, and reasoning into the data validation process. It was especially exciting to co-design the system with real machine learning practitioners, grounding the tool in practical needs and day-to-day workflows. I hope VISTA inspires others to invest in human-centered tools for making AI more reliable, interpretable, and responsible.
Xiwei Xuan
University of California Davis
Read the Original
This page is a summary of: VISTA: A Visual Analytics Framework to Enhance Foundation Model-Generated Data Labels, IEEE Transactions on Visualization and Computer Graphics, January 2025, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/tvcg.2025.3535896.
You can read the full text:
Contributors
The following have contributed to this page







