Analyzing AI Evaluation Benchmarks Through Information Retrieval and Network Science

Gaia Simeoni; Michael Soprano; Riccardo Lunardi; Kevin Roitero; Stefano Mizzaro

doi:10.1007/978-3-032-21300-6_25

What is it about?

Many analyses have been performed on Information Retrieval (IR) evaluation benchmarks. Benchmarking also plays a central role in evaluating the capabilities of Large Language Models (LLMs). In this paper, we apply an IR approach to LLM evaluation. Adapting a method developed for TREC test collections, we analyze LLM benchmark results through the lens of network science. We construct a bipartite graph between models and benchmark questions and apply Kleinberg’s HITS algorithm to uncover latent structure in the evaluation data. In this framework, model hubness quantifies a model’s tendency to perform well on easy questions, while question hubness captures its ability to discriminate between more and less effective models. We conduct experiments on seven multiple-choice QA benchmarks with a pool of 34 LLMs. Through this IR-inspired approach, we show that the ranking of models on leaderboards is strongly influenced by subsets of easy questions.

This page is a summary of: Analyzing AI Evaluation Benchmarks Through Information Retrieval and Network Science, January 2026, Springer Science + Business Media,
DOI: 10.1007/978-3-032-21300-6_25.
You can read the full text:

Read

Contributors

The following have contributed to this page

Michael Soprano
Universita degli Studi di Udine

Analyzing AI Evaluation Benchmarks Through Information Retrieval and Network Science

What is it about?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Analyzing AI Evaluation Benchmarks Through Information Retrieval and Network Science

What is it about?

Featured Image

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management