Abstract:Score-based statistical models play an important role in modern machine learning, statistics, and signal processing. For hypothesis testing, a score-based hypothesis test is proposed in \cite{wu2022score}. We analyze the performance of this score-based hypothesis testing procedure and derive upper bounds on the probabilities of its Type I and II errors. We prove that the exponents of our error bounds are asymptotically (in the number of samples) tight for the case of simple null and alternative hypotheses. We calculate these error exponents explicitly in specific cases and provide numerical studies for various other scenarios of interest.

What problem does this paper attempt to address?

The paper mainly discusses the application of score-based hypothesis testing in statistical modeling, especially in addressing the challenges of handling unnormalized models and score models in modern machine learning. Traditional likelihood ratio test (LRT) is optimal when the data density is known, but in many complex models, exact likelihood computation is difficult. Therefore, the paper proposes a score-based hypothesis testing method that utilizes the Hyvärinen Score. The paper first highlights the importance of score matching, especially in image generation tasks, where it outperforms likelihood-based methods. It then points out that gradient scores can be learned when the density of the data distribution is unknown, but the exact likelihood of unnormalized models cannot be computed directly. Therefore, the paper proposes a score-based binary hypothesis testing method that relies on the Hyvärinen Score instead of likelihood ratio. The paper analyzes the upper bounds of type I error (false positive) and type II error (false negative) of this score-based test under finite samples, and proves that these bounds are tight for simple hypothesis testing as the sample size approaches infinity. By using large deviation theory, the paper demonstrates the accuracy of these bounds in asymptotic behavior and provides numerical simulations to estimate the error exponent. Specifically, the paper calculates closed-form expressions of error exponents for multivariate Gaussian distributions and conducts numerical experiments with synthetic data (such as multivariate normal distributions, exponential families, and Gauss-Bernoulli restricted Boltzmann machines) as well as real-world data (such as the KDD Cup'99 network security dataset). The experimental results show that the proposed analysis is consistent with theoretical predictions, and as the sample size increases, the error exponent approaches the theoretical limit. In summary, the paper addresses the problem of hypothesis testing when the exact density of the data is unknown, proposes a score-based testing method, and provides in-depth theoretical analysis and empirical validation of its performance.

Large Deviation Analysis of Score-based Hypothesis Testing

Large Deviation Analysis of Score-Based Hypothesis Testing

Proof of Han's Hypothesis on Relations among Two-Sided M-Bayesian Credible Limits and Corresponding Two-Sided Classical Confidence Limits

A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models

A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING

Sub-Gaussian Error Bounds for Hypothesis Testing

Universal Hypothesis Testing with Kernels: Asymptotically Optimal Tests for Goodness of Fit

Cramér-type moderate deviations for Studentized two-sample $U$-statistics with applications

How can the score test be consistent?

Finite-sample expansions for the optimal error probability in asymmetric binary hypothesis testing

Bayesian model comparison with the Hyvärinen score: computation and consistency

Reprint: Hypothesis testing on high dimensional quantile regression

A Cramér moderate deviation theorem for Hotelling's $T^2$-statistic with applications to global tests

Interpretation and Generalization of Score Matching

Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions

Nonparametric Score Estimators

Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling

Generalized score test of homogeneity for mixed effects models

Generalized Score Matching

Convergence Analysis of Probability Flow ODE for Score-based Generative Models

Unification of Rare/Weak Detection Models using Moderate Deviations Analysis and Log-Chisquared P-values