Inference on testing the number of spikes in a high-dimensional generalized two-sample spiked model and its applications

Rui Wang,Dandan Jiang
2024-01-08
Abstract:Two-sample spiked model is an important issue in multivariate statistical inference. This paper focuses on testing the number of spikes in a high-dimensional generalized two-sample spiked model, which is free of Gaussian population assumption and the diagonal or block-wise diagonal restriction of population covariance matrix, and the spiked eigenvalues are not necessary required to be bounded. In order to determine the number of spikes, we first propose a general test, which relies on the partial linear spectral statistics. We establish its asymptotic normality under the null hypothesis. Then we apply the conclusion to two statistical problem, variable selection in large-dimensional linear regression and change point detection when change points and additive outliers exist simultaneously. Simulations and empirical analysis are conducted to illustrate the good performance of our methods.
Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the method of testing the number of spikes in the high - dimensional generalized two - sample spike model. Specifically, the researchers focus on how to determine the number of spikes without the Gaussian population assumption and the diagonal or block - diagonal restrictions on the population covariance matrix. The paper proposes a general testing method based on partial linear spectral statistics and establishes its asymptotic normality under the null hypothesis. In addition, the researchers also apply this conclusion to two statistical problems: variable selection in large - dimensional linear regression and change - point detection when change - points and additive outliers exist simultaneously. ### Analysis of the Core Problems in the Paper 1. **Background and Problem Definition** - **Background**: The spike model was originally proposed by Johnstone et al. to describe the phenomenon that the extreme eigenvalues of some matrices are significantly separated from other eigenvalues. This model plays an important role in multivariate statistical inference and is widely used in many modern fields such as wireless communication and speech recognition. - **Problem**: In the context of high - dimensional data, estimating the number of spikes is of great significance for statistical inference and practical applications, which can help determine the latent dimension of data and reconstruct the population covariance structure. However, there are few studies on the estimation of the number of spikes in the two - sample spike model, and this problem is crucial for dimension reduction. 2. **Research Objectives** - **Objectives**: This paper aims to propose a general testing method for determining the number of spikes in the high - dimensional generalized two - sample spike model. This method does not depend on the specific form of the population distribution and the diagonal or block - diagonal assumptions of the covariance matrix. 3. **Methods and Results** - **Methods**: The researchers proposed a testing method based on partial linear spectral statistics and established its asymptotic normality under the null hypothesis. In addition, an accurate numerical evaluation method for the central parameter term is also provided. - **Applications**: This method is applied to two specific statistical problems: - **Variable Selection in Large - Dimensional Linear Regression**: Select important regression variables by testing the number of non - zero coefficients. - **Change - Point Detection**: Detect the location of change - points when change - points and additive outliers exist simultaneously. 4. **Advantages and Contributions** - **Advantages**: Compared with existing methods, the testing method proposed in this paper has better generality because it does not depend on the specific form of the population distribution and the diagonal or block - diagonal assumptions of the covariance matrix. In addition, this method shows good performance in practical applications. - **Contributions**: This paper not only provides a new testing method theoretically, but also verifies the effectiveness of this method in practical applications, especially in variable selection in large - dimensional linear regression and change - point detection. ### Conclusion This paper proposes a general method for testing the number of spikes in the high - dimensional generalized two - sample spike model, and verifies the effectiveness and superiority of this method through theoretical analysis and practical applications. This method performs well in variable selection in large - dimensional linear regression and change - point detection, etc., and has important theoretical and practical significance.