What is it about?

Two-sentence summary: A p-value testing a null hypothesis is completely different from a p-value testing a Bayesian model. This paper demonstrates that if the former p-value is sufficiently low and if the latter p-value is sufficiently high, then the model's posterior probability of the null hypothesis is low. Much of the recent criticism of null hypothesis significance testing comes from Bayesians, many of whom support checking priors or other aspects of Bayesian models. It turns out that if a p-value for testing a null hypothesis is sufficiently low, then the Bayesian model would fail a prior predictive check. That check leads to a new way to calibrate p-values. While the calibrated p-values do not require them to be compared to a fixed significance threshold, they depend on the threshold chosen for checking the Bayesian model. That is because the Bayesian model is rejected if a prior predictive p-value is less than a fixed threshold.

Featured Image

Why is it important?

Some have gone beyond criticizing the abuse of the p-value to criticizing the p-value itself, leading to bans of null hypothesis significance testing. This paper demonstrates that very low p-values nonetheless indicate that the null hypothesis should be rejected even from a Bayesian point of view. In conclusion, the main result suggests that some criticisms of null hypothesis significance testing require qualification and nuance. Additionally, the paper discusses how the resulting calibrated p-values may prove useful for both frequentists and Bayesians. (The proposed method of calibrating p-values differs from the traditional calibrations based on bounds of Bayes factors.)


The main result is relevant to debates about the replication crisis that are taking place not only in the statistics literature but also in the literature of multiple fields of science. For further reading, follow the "Related papers" link.

David R. Bickel
University of North Carolina at Greensboro

Read the Original

This page is a summary of: Null hypothesis significance testing defended and calibrated by Bayesian model checking, The American Statistician, December 2019, Taylor & Francis, DOI: 10.1080/00031305.2019.1699443.
You can read the full text:




The following have contributed to this page