safe testing / e-values / anytime-valid inference

Marco Cattaneo

Department of Clinical Research, University of Basel

5 March 2024

e-value

  • an e-value is a statistic \(E\geq 0\) such that \[\mathbb{E}[E]\leq 1\quad \textsf{under}\ H_0\]

  • it is a (one-sided) test statistic: we expect small values under \(H_0\) and larger ones under \(H_1\)

  • examples:

    • likelihood ratio \(\mathbb{P}_1(data) / \mathbb{P}_0(data)\) when \(H_0\) and \(H_1\) are simple
    • Bayes factor \(\mathbb{P}_1(data) / \mathbb{P}_0(data)\) when \(H_0\) is simple and \(\mathbb{P}_1\) is the marginal probability under a prior on \(H_1\)

growth-rate optimality

  • Grünwald et al. (2024) introduce GRO as alternative to power for selecting the best e-value

  • it essentially correspond to maximizing the (worst-case) \[\mathbb{E}[log(E)]\quad \textsf{under}\ H_1\]

  • GRO e-values turn out to be Bayes factors with special priors

optional continuation

  • Doob’s optional stopping theorem implies that the product of e-values from independent samples is always an e-value, even when the decision to collect new samples arbitrarily depends on previous samples

  • Markov’s inequality implies that \[E\geq 1/\alpha\] is a (conservative) significance test at level \(\alpha\)

estimation

  • duality test-CI: \[\{\delta: E_\delta \leq 1/\alpha\}\] is a (conservative) CI for \(\delta\) at level \(1-\alpha\), when for each \(\delta_0\), \(E_{\delta_0}\) is an e-value for \(H_0:\delta=\delta_0\)

  • in analogy with likelihood ratios, \[\arg \min_\delta E_\delta\] is the minimum e-value estimate of \(\delta\)

sample size

  • in general, conservative tests based on e-values require double sample sizes, compared to standard tests

  • sequential tests based on e-values may require on average smaller sample sizes than standard tests with fixed sample sizes (Grünwald et al., 2024), but they may still compare unfavorably with standard sequential tests (Georgiev, 2022)