What is it about?

It has become standard to evaluate Natural Language Processing (NLP) algorithms on multiple datasets in order to ensure a consistent performance across varied setups. When doing so, one has to change the statistical analysis of the results to consider all tested hypotheses (one for each dataset). In this paper we explain how to perform such an analysis with a special consideration to NLP applications.

Featured Image

Why is it important?

It is crucial to perform a correct and valid statistical analysis in an empirical research area such as NLP. This paper goal is to ensure the researchers perform the statistical analysis in a valid way.

Read the Original

This page is a summary of: Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets, December 2017, The MIT Press,
DOI: 10.1162/tacl_a_00074.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page