PILs of Knowledge: A Synthetic Benchmark for Evaluating Question Answering Systems in Healthcare

Riccardo Lunardi; Michael Soprano; Paolo Coppola; Vincenzo Della Mea; Stefano Mizzaro; Kevin Roitero

doi:10.1145/3726302.3730283

What is it about?

Patient Information Leaflets (PILs) provide essential information about medication usage, side effects, precautions, and interactions, making them a valuable resource for Question Answering (QA) systems in healthcare. However, no dedicated benchmark currently exists to evaluate QA systems specifically on PILs, limiting progress in this domain. To address this gap, we introduce a fact-supported synthetic benchmark composed of multiple-choice questions and answers generated from real PILs. We construct the benchmark using a fully automated pipeline that leverages multiple Large Language Models (LLMs) to generate diverse, realistic, and contextually relevant question-answer pairs. The benchmark is publicly released as a standardized evaluation framework for assessing the ability of LLMs to process and reason over PIL content. To validate its effectiveness, we conduct an initial evaluation with state-of-the-art LLMs, showing that the benchmark presents a realistic and challenging task, making it a valuable resource for advancing QA research in the healthcare domain.

This page is a summary of: PILs of Knowledge: A Synthetic Benchmark for Evaluating Question Answering Systems in Healthcare, July 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3726302.3730283.
You can read the full text:

Read

Resources

Data
Benchmark
- Multiple-choice questions generated from authentic PIL documents - Evaluation-ready: standardized format for benchmarking open-domain and closed-domain QA systems - Baseline results with state-of-the-art LLMs, highlighting the benchmark’s difficulty and relevance

Contributors

The following have contributed to this page

PILs of Knowledge: A Synthetic Benchmark for Evaluating Question Answering Systems in Healthcare

What is it about?

Resources

Benchmark

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

PILs of Knowledge: A Synthetic Benchmark for Evaluating Question Answering Systems in Healthcare

What is it about?

Featured Image

Read the Original

Resources

Benchmark

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management