Abstract:Multivariate, heteroscedastic errors complicate statistical inference in many large-scale denoising problems. Empirical Bayes is attractive in such settings, but standard parametric approaches rest on assumptions about the form of the prior distribution which can be hard to justify and which introduce unnecessary tuning parameters. We extend the nonparametric maximum likelihood estimator (NPMLE) for Gaussian location mixture densities to allow for multivariate, heteroscedastic errors. NPMLEs estimate an arbitrary prior by solving an infinite-dimensional, convex optimization problem; we show that this convex optimization problem can be tractably approximated by a finite-dimensional version. The empirical Bayes posterior means based on an NPMLE have low regret, meaning they closely target the oracle posterior means one would compute with the true prior in hand. We prove an oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising without prior knowledge. We provide finite-sample bounds on the average Hellinger accuracy of an NPMLE for estimating the marginal densities of the observations. We also demonstrate the adaptive and nearly-optimal properties of NPMLEs for deconvolution. We apply our method to two denoising problems in astronomy, constructing a fully data-driven color-magnitude diagram of 1.4 million stars in the Milky Way and investigating the distribution of 19 chemical abundance ratios for 27 thousand stars in the red clump. We also apply our method to hierarchical linear models, illustrating the advantages of nonparametric shrinkage of regression coefficients on an education data set and on a microarray data set.

Learning from a lot: Empirical Bayes in high-dimensional prediction settings

Empirical Bayes in Bayesian learning: understanding a common practice

Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm

Empirical Bayes inference in sparse high-dimensional generalized linear models

Big Learning with Bayesian Methods

High-dimensional prediction for count response via sparse exponential weights

Scalable Bayesian regression in high dimensions with multiple data sources

Adaptive Bayesian Predictive Inference in High-dimensional Regerssion

Inference algorithms and learning theory for Bayesian sparse factor analysis

Empirical Bayes for Large-scale Randomized Experiments: a Spectral Approach

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Multivariate, Heteroscedastic Empirical Bayes via Nonparametric Maximum Likelihood

Semi-supervised empirical Bayes group-regularized factor regression

Empirical Bayes large-scale multiple testing for high-dimensional binary outcome data

Bayesian High-dimensional Linear Regression with Sparse Projection-posterior

A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression

The winner′s curse under dependence: repairing empirical Bayes using convoluted densities

Distributional Robustness and Transfer Learning Through Empirical Bayes

Nonparametric Bayes Classification via Learning of Affine Subspaces

Bayesian Model Selection for High-Dimensional Ising Models, With Applications to Educational Data

Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem