Abstract:In analyzing high-dimensional models, sparsity of the model parameter is a common but often undesirable assumption. In this paper, we study the following two-sample testing problem: given two samples generated by two high-dimensional linear models, we aim to test whether the regression coefficients of the two linear models are identical. We propose a framework named TIERS (short for TestIng Equality of Regression Slopes), which solves the two-sample testing problem without making any assumptions on the sparsity of the regression parameters. TIERS builds a new model by convolving the two samples in such a way that the original hypothesis translates into a new moment condition. A self-normalization construction is then developed to form a moment test. We provide rigorous theory for the developed framework. Under very weak conditions of the feature covariance, we show that the accuracy of the proposed test in controlling Type I errors is robust both to the lack of sparsity in the features and to the heavy tails in the error distribution, even when the sample size is much smaller than the feature dimension. Moreover, we discuss minimax optimality and efficiency properties of the proposed test. Simulation analysis demonstrates excellent finite-sample performance of our test. In deriving the test, we also develop tools that are of independent interest. The test is built upon a novel estimator, called Auto-aDaptive Dantzig Selector (ADDS), which not only automatically chooses an appropriate scale of the error term but also incorporates prior information. To effectively approximate the critical value of the test statistic, we develop a novel high-dimensional plug-in approach that complements the recent advances in Gaussian approximation theory.

Incorporation of Sparsity Information in Large-scale Multiple Two-sample $t$ Tests

Tests for a Multiple-Sample Problem in High Dimensions

Large-Scale Simultaneous Testing Using Kernel Density Estimation

A New Procedure for Controlling False Discovery Rate in Large-Scale t-tests

A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING

Two-sample testing in non-sparse high-dimensional linear models

Robustness and accuracy of methods for high dimensional data analysis based on Student's t statistic

Asymptotic Uncertainty of False Discovery Proportion for Dependent $t$-Tests

Finite sample t-tests for high-dimensional means

Sparse-limit approximation for t-statistics

Heterocedasticity-Adjusted Ranking and Thresholding for Large-Scale Multiple Testing

Simultaneous critical values for $t$-tests in very high dimensions

High-dimensional Two-Sample Mean Vectors Test and Support Recovery with Factor Adjustment.

Multiple two-sample testing under arbitrary covariance dependency with an application in imaging mass spectrometry

Consistent estimation of the proportion of false nulls and FDR for adaptive multiple testing Normal means under weak dependence

On Controlling the False Discovery Rate in Multiple Testing of the Means of Correlated Normals Against Two-Sided Alternatives

Empirical Bayes large-scale multiple testing for high-dimensional binary outcome data

LAWS: A Locally Adaptive Weighting and Screening Approach to Spatial Multiple Testing

An exact projection pursuit-based algorithm for multivariate two-sample nonparametric testing applicable to retrospective and group sequential studies

Directional FDR Control for Sub-Gaussian Sparse GLMs

An adaptable generalization of Hotelling's $T^2$ test in high dimension