safe testing / e-values / anytime-valid inference

Marco Cattaneo

Department of Clinical Research, University of Basel

5 March 2024

e-value

an e-value is a statistic \(E\geq 0\) such that \[\mathbb{E}[E]\leq 1\quad \textsf{under}\ H_0\]
it is a (one-sided) test statistic: we expect small values under \(H_0\) and larger ones under \(H_1\)
examples:
- likelihood ratio \(\mathbb{P}_1(data) / \mathbb{P}_0(data)\) when \(H_0\) and \(H_1\) are simple
- Bayes factor \(\mathbb{P}_1(data) / \mathbb{P}_0(data)\) when \(H_0\) is simple and \(\mathbb{P}_1\) is the marginal probability under a prior on \(H_1\)

Grünwald et al. (2024) introduce GRO as alternative to power for selecting the best e-value
it essentially correspond to maximizing the (worst-case) \[\mathbb{E}[log(E)]\quad \textsf{under}\ H_1\]
GRO e-values turn out to be Bayes factors with special priors

Doob’s optional stopping theorem implies that the product of e-values from independent samples is always an e-value, even when the decision to collect new samples arbitrarily depends on previous samples
Markov’s inequality implies that \[E\geq 1/\alpha\] is a (conservative) significance test at level \(\alpha\)

duality test-CI: \[\{\delta: E_\delta \leq 1/\alpha\}\] is a (conservative) CI for \(\delta\) at level \(1-\alpha\), when for each \(\delta_0\), \(E_{\delta_0}\) is an e-value for \(H_0:\delta=\delta_0\)
in analogy with likelihood ratios, \[\arg \min_\delta E_\delta\] is the minimum e-value estimate of \(\delta\)

in general, conservative tests based on e-values require double sample sizes, compared to standard tests
sequential tests based on e-values may require on average smaller sample sizes than standard tests with fixed sample sizes (Grünwald et al., 2024), but they may still compare unfavorably with standard sequential tests (Georgiev, 2022)