Abstract:A century ago, when Student's t-statistic was introduced, no one ever imagined its increasing applicability in the modern era. It finds applications in highly multiple hypothesis testing, feature selection and ranking, high dimensional signal detection, etc. Student's t-statistic is constructed based on the empirical distribution function (EDF). An alternative choice to the EDF is the kernel density estimate (KDE), which is a smoothed version of the EDF. The novelty of the work consists of an alternative to Student's t-test that uses the KDE technique and exploration of the usefulness of KDE based t-test in the context of its application to large-scale simultaneous hypothesis testing. An optimal bandwidth parameter for the KDE approach is derived by minimizing the asymptotic error between the true p-value and its asymptotic estimate based on normal approximation. If the KDE-based approach is used for large-scale simultaneous testing, then it is interesting to consider, when does the method fail to manage the error rate? We show that the suggested KDE-based method can control false discovery rate (FDR) if total number tests diverge at a smaller order of magnitude than N3/2, where N is the total sample size. We compare our method to several possible alternatives with respect to FDR. We show in simulations that our method produces a lower proportion of false discoveries than its competitors. That is, our method better controls the false discovery rate than its competitors. Through these empirical studies, it is shown that the proposed method can be successfully applied in practice. The usefulness of the proposed methods is further illustrated through a gene expression data example.

Empirical Bayes cumulative $\ell$-value multiple testing procedure for sparse sequences

Empirical Bayes large-scale multiple testing for high-dimensional binary outcome data

Block-Sparse Signal Recovery Based on Adaptive Matching Pursuit Via Spike and Slab Prior

Bayesian Updating and Sequential Testing: Overcoming Inferential Limitations of Screening Tests

Disjunct Support Spike and Slab Priors for Variable Selection in Regression under Quasi-sparseness

Optimal post-selection inference for sparse signals: a nonparametric empirical-Bayes approach

Sequential Testing for Sparse Recovery

Empirical Bayes for Large-scale Randomized Experiments: a Spectral Approach

A Truncated EM Approach for Spike-and-Slab Sparse Coding

Limiting laws and consistent estimation criteria for fixed and diverging number of spiked eigenvalues

A Unified Framework For Change Point Detection In High-Dimensional Linear Models

Incorporation of Sparsity Information in Large-scale Multiple Two-sample $t$ Tests

Bayesian Spike Train Inference via Non-Local Priors

Empirical partially Bayes multiple testing and compound $χ^2$ decisions

Precise Error Rates for Computationally Efficient Testing

Sparse-limit approximation for t-statistics

Fast Laplace Approximation for Sparse Bayesian Spike and Slab Models

Bayesian Quantile Regression Based on the Empirical Likelihood with Spike and Slab Priors

Multiple Testing in Nonparametric Hidden Markov Models: An Empirical Bayes Approach

Bayesian High-dimensional Linear Regression with Sparse Projection-posterior

Large-Scale Simultaneous Testing Using Kernel Density Estimation