What is it about?

We propose a new approach to evaluate fairness measures in machine learning. Fairness measures typically assess whether models unfairly discriminate against certain individuals or groups based on protected attributes such as gender or race. Our method tests how well a given fairness measure distinguishes between different machine learning models, as well as among individual data points. Our approach is based on a popular approach from education, called item response theory, for designing exams that are able to distinguish the abilities of a group of test-takers. In analogy to item response theory, we consider each individual data point to be the items/questions in an exam, each machine learning predictor to be a particular student taking the exam, and the fairness measure to be the resulting response score. We show that we can interpret whether the fairness measure is challenging enough, distinguishes different machine learning algorithms, and at the same time shows how fair machine learning algorithms are.

Featured Image

Why is it important?

As AI is increasingly deployed in society, discriminatory and unfair predictive models have become a serious concern. Our interpretable approach, based on educational testing, allows us to choose suitable fairness measures, similar to how we design high stakes exams such as university entrance exams.

Perspectives

As artificial intelligence (AI) has a broad uptake in society, it is increasingly important that we have a cross-pollination between different fields of research. In this particular case, we draw on knowledge in educational testing to help create a way to evaluate fairness in machine learning.

Cheng Soon Ong
CSIRO

Read the Original

This page is a summary of: Fairness Evaluation with Item Response Theory, April 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3696410.3714883.
You can read the full text:

Read

Contributors

The following have contributed to this page