AlignCP: Noise-Aware Preference Alignment for LLMs via Confidence and Polarity Reweighting

Chen Cheng; Hefei Xu; Le Wu

doi:10.1145/3774904.3792972

What is it about?

This paper proposes AlignCP, a lightweight and interpretable data-aware preference alignment framework designed to mitigate the negative impacts of noise in preference data (e.g., label flipping, inconsistent judgments) on DPO/preference optimization training. Through empirical analysis on datasets such as Anthropic-HH, we find that approximately 25% of preference pairs exhibit significant inconsistency between reward model predictions and human annotations, and such samples often fail to provide effective training signals or even degrade model behavior. Based on this observation, AlignCP constructs two interpretable signals from the reward model outputs: Confidence, which measures the certainty of the preference decision, and Polarity, which characterizes the consistency between the reward ranking direction and human annotations. By jointly designing sample weights based on these two signals, AlignCP performs reinforcement learning on preference pairs with high confidence and consistent polarity, while suppressing low-confidence or direction-conflicting samples, thereby effectively filtering out noise interference without requiring data relabeling. Experimental results demonstrate that AlignCP significantly outperforms DPO and its several variants on benchmark datasets such as Anthropic-HH, achieving consistent improvements in both helpfulness and safety metrics, and exhibiting stronger robustness under noisy scenarios such as label flipping. Overall, AlignCP provides an automated, interpretable, and efficient quality control and reweighting strategy for preference data, offering a practical solution for robustly improving LLM preference alignment performance.

Photo by Jona on Unsplash

Why is it important?

It eliminates the need for extensive human intervention and incurs only minimal additional computational overhead.

This page is a summary of: AlignCP: Noise-Aware Preference Alignment for LLMs via Confidence and Polarity Reweighting, April 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3774904.3792972.
You can read the full text:

Read

Contributors

The following have contributed to this page

Chen Cheng
Hefei University of Technology

AlignCP: Noise-Aware Preference Alignment for LLMs via Confidence and Polarity Reweighting

What is it about?

Why is it important?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

AlignCP: Noise-Aware Preference Alignment for LLMs via Confidence and Polarity Reweighting

What is it about?

Featured Image

Why is it important?

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management