Abstract:This paper proposes procedures for testing the equality hypothesis and the proportionality hypothesis involving a large number of $q$ covariance matrices of dimension $p\times p$. Under a limiting scheme where $p$, $q$ and the sample sizes from the $q$ populations grow to infinity in a proper manner, the proposed test statistics are shown to be asymptotically normal. Simulation results show that finite sample properties of the test procedures are satisfactory under both the null and alternatives. As an application, we derive a test procedure for the Kronecker product covariance specification for transposable data. Empirical analysis of datasets from the Mouse Aging Project and the 1000 Genomes Project (phase 3) is also conducted.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the hypothesis testing problem regarding the equality and proportionality of multiple sample covariance matrices, especially in the high - dimensional data environment (that is, when the dimension \(p\) of the covariance matrix is large and the sample size is limited). Specifically:
1. **Hypothesis testing for the equality of multiple - sample covariance matrices**:
- Problem description: Given samples from \(q\) different populations, each population has its own covariance matrix \(\Sigma_i\). How to test whether these covariance matrices are equal?
- Hypothesis form: \(H_0:\Sigma_1 = \Sigma_2=\cdots=\Sigma_q\).
2. **Hypothesis testing for the proportionality of multiple - sample covariance matrices**:
- Problem description: Given samples from \(q\) different populations, each population has its own covariance matrix \(\Sigma_i\). How to test whether these covariance matrices are proportional?
- Hypothesis form: \(H_0:\Sigma_i = a_{ij}\Sigma_j\) for all \(i\neq j\), where \(a_{ij}>0\) are unknown constants.
### Research background and motivation
In high - dimensional data analysis, traditional statistical methods often perform poorly or are no longer applicable when dealing with a large number of covariance matrices. Therefore, it is of great significance to study the hypothesis testing problem of high - dimensional covariance matrices. Especially when the number of populations \(q\) is also large, the existing multi - sample testing methods cannot be directly applied. This paper aims to fill this gap and propose a hypothesis testing method suitable for a large number of populations (that is, large \(q\)) and high - dimensional data (that is, large \(p\)).
### Application scenarios
The paper mentions two practical application scenarios:
- **Mouse Aging Project data set**: It contains 16 populations, each population has 40 samples, and the variable dimension is 46. Researchers hope to test whether the covariance matrices of these 16 populations are proportional.
- **1000 Genomes Project data set (phase III)**: It contains 26 populations, the sample size of each population is between 64 and 108, and the variable dimension is 112515. Researchers hope to test whether the covariance matrices of these 26 populations are equal.
Through the application of these two actual data sets, the effectiveness and practicality of the proposed testing method are verified.