What is it about?

This research introduces DCA-Bench, a new benchmark designed to test whether AI models can help spot hidden problems in datasets—a task known as dataset curation. Instead of fixing already-known issues, DCA-Bench challenges AI to identify real-world problems by showing them tricky examples and asking, “Can you find what’s wrong here?” To fairly evaluate how well AI models perform this task, the authors also created a smart evaluation method using another AI system to judge the answers. Experiments show that while today’s top language models show promise, they still struggle with subtle or complex issues, pointing to areas for future improvement.

Featured Image

Why is it important?

Data is the fuel of modern AI—whether we're training or evaluating models, we always rely on datasets. As a result, open data platforms play a vital role in the research community. However, researchers frequently encounter hidden issues during the usage of those datasets. These issues go beyond just the raw data; they also affect documentation, metadata, associated code scripts, as well as the consistency across these components. Unfortunately, these problems are often reported only by human maintainers and users after extensive manual investigation, with non-trivial efforts.

Perspectives

DCA-Bench provides a foundation and testbed to support future research in this direction by building first benchmark to test AI Agent's ability to assist in detecting hidden issues is those open dataset repositories. Further improvement can be made in aspects like uni-modal context only, slightly insufficient cases.

Benhao Huang
Carnegie Mellon University Department of Computer Science

Read the Original

This page is a summary of: DCA-Bench: A Benchmark for Dataset Curation Agents, August 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3711896.3737422.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page