Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark

Suyeon Kim; SeongKu Kang; Dongwoo Kim; Jungseul Ok; Hwanjo Yu

doi:10.1145/3711896.3737376

What is it about?

AI models that work on networks (like citation graphs or product co-purchase graphs) often rely on labels that can be wrong. Past tests used overly simple, random mistakes. This paper introduces BeGIN, a benchmark that simulates more realistic, item-specific errors — including ones made by a large language model acting like a human annotator. It compares many model types and training strategies, finding that: (1) human-like mistakes are harder to handle, (2) robustness depends on the graph type, and (3) some architectures, like GraphSAGE, cope better. BeGIN aims to help researchers design AI that stays accurate even when the data is messy.

Photo by BoliviaInteligente on Unsplash

Why is it important?

Most real-world graph data contains complex, example-specific labeling mistakes, and without testing GNNs on these realistic conditions, we risk developing models that perform well in theory but fail when applied to real-world problems.

Perspectives

I aimed to explore how different types of label noise affect Graph Neural Networks and to investigate ways to address the challenges they create. My hope is that others can build on these results to develop more robust models that perform reliably in real-world settings. More than anything, I hope this work encourages researchers to look beyond simplified assumptions and tackle the messy, complex nature of real data.
Suyeon Kim
Pohang University of Science and Technology

This page is a summary of: Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark, August 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3711896.3737376.
You can read the full text:

Read

Contributors

The following have contributed to this page

Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management