What is it about?
Question answering (QA) systems help computers respond to human questions in natural language, like in virtual assistants or search engines. While many QA systems exist, figuring out how to quantifly measure their performance is challenging. This survey explains the main ways QA systems are evaluated, dividing them into two groups: scores based on human judgment and scores calculated automatically.
Featured Image
Why is it important?
Evaluating question answering systems is essential for advancing more natural and reliable human–machine interactions. By clarifying how existing evaluation methods work, where they fall short, and how they can be systematically categorized, this work provides researchers with clearer guidance for comparing systems and developing more robust QA technologies.
Read the Original
This page is a summary of: Evaluation of Question Answering Systems: Complexity of Judging a Natural Language, ACM Computing Surveys, August 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3744663.
You can read the full text:
Contributors
The following have contributed to this page







