24 April 2023

what is a p-value?

  • a p-value is the (worst-case) probability of obtaining a test statistic at least as extreme as the observed one (in a hypothetical repetition of the study), assuming that model and null hypothesis are correct

  • mathematically, it is a conditional probability: that is, a random (i.e. data-dependent) probability value

  • it is typically uniformly distributed on the interval [0,1] under the null hypothesis, while more concentrated towards 0 under the alternative hypothesis

  • hence, small p-values correspond to statistical evidence in favor of the alternative hypothesis vs the null hypothesis

why retiring the p-value?

  • it is important to distinguish between p-value and statistical significance (p<0.05)

  • the latter is criticized much more than the former

  • the word “significance” is problematic because of its general meaning (but statistical significance depends on the sample size, while real-world significance does not)

  • the real problem is dichotomizing p-values and attaching too much importance to statistical significance (see examples of craving for significance)

  • unfortunately, how papers are read is more important than how they are written

Bayesianism as alternative?

  • the main problem of p-values is their misuse: unfortunately, misuse of Bayesian methods is not better than misuse of frequentist ones

  • the standard Bayesian statistical approach require a subjective prior and delivers a subjective posterior, which may suit better exploratory than confirmatory research

  • ignorance cannot be expressed by a probabilistic prior: this is the main reason for the development of classical statistics

is there a reproducibility crisis?



Baker: Is there a reproducibility crisis? [Nature, 2016]

accept uncertainty!

  • deduction: All men are mortal. Socrates is a man. Therefore, Socrates is mortal.

  • induction: Will the sun rise tomorrow?

  • statistical significance is often misused to obtain certainty from induction, which is impossible

  • a central tenet of classical statistics is placing bounds on the probability of wrong conclusions

  • the significance level is a bound on the probability of false positives

plan ahead and be transparent!


  • writing a statistical analysis plan (before looking at the data) helps against questionable research practices such as p-hacking and data dredging
    \(\qquad\)if you torture the data enough, nature will always confess [Coase, 1982]

  • sensitivity analyses help assessing the robustness and credibility of the results
    \(\qquad\)all models are wrong, but some are useful [Box, 1987]

report estimates and uncertainty!

  • standard reporting format for effect size: \(~\) estimate \({}\) [95% CI] \({}\) p-value

  • CIs express also statistical significance, but are rarely criticized

ASA statement on p-values: principles

  • p-values can indicate how incompatible the data are with a specified statistical model [and the null hypothesis]

  • p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone

  • scientific conclusions and business [?] or policy decisions should not be based only on whether a p-value passes a specific threshold

  • proper inference requires full reporting and transparency [but synthesis is also important]

  • a p-value, or statistical significance, does not measure the size of an effect or the importance of a result

  • by itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis [sensitivity analyses are essential]

summary