Likelihood-free hypothesis testing
Patrik Róbert Gerber,Yury Polyanskiy
2022-11-02
Abstract:Consider the problem of binary hypothesis testing. Given $Z$ coming from
either $\mathbb P^{\otimes m}$ or $\mathbb Q^{\otimes m}$, to decide between
the two with small probability of error it is sufficient, and in many cases
necessary, to have $m\asymp1/\epsilon^2$, where $\epsilon$ measures the
separation between $\mathbb P$ and $\mathbb Q$ in total variation
($\mathsf{TV}$). Achieving this, however, requires complete knowledge of the
distributions and can be done, for example, using the Neyman-Pearson test. In
this paper we consider a variation of the problem which we call likelihood-free
hypothesis testing, where access to $\mathbb P$ and $\mathbb Q$ is given
through $n$ i.i.d. observations from each. In the case when $\mathbb P$ and
$\mathbb Q$ are assumed to belong to a non-parametric family, we demonstrate
the existence of a fundamental trade-off between $n$ and $m$ given by $nm\asymp
n_\sf{GoF}^2(\epsilon)$, where $n_\sf{GoF}(\epsilon)$ is the minimax sample
complexity of testing between the hypotheses $H_0:\, \mathbb P=\mathbb Q$ vs
$H_1:\, \mathsf{TV}(\mathbb P,\mathbb Q)\geq\epsilon$. We show this for three
families of distributions, in addition to the family of all discrete
distributions for which we obtain a more complicated trade-off exhibiting an
additional phase-transition. Our results demonstrate the possibility of testing
without fully estimating $\mathbb P$ and $\mathbb Q$, provided $m \gg
1/\epsilon^2$.
Information Theory,Statistics Theory