Rank-adaptive covariance testing with applications to genomics and neuroimaging

David Veitch,Yinqiu He,Jun Young Park
2024-11-22
Abstract:In biomedical studies, testing for differences in covariance offers scientific insights beyond mean differences, especially when differences are driven by complex joint behavior between features. However, when differences in joint behavior are weakly dispersed across many dimensions and arise from differences in low-rank structures within the data, as is often the case in genomics and neuroimaging, existing two-sample covariance testing methods may suffer from power loss. The Ky-Fan(k) norm, defined by the sum of the top Ky-Fan(k) singular values, is a simple and intuitive matrix norm able to capture signals caused by differences in low-rank structures between matrices, but its statistical properties in hypothesis testing have not been studied well. In this paper, we investigate the behavior of the Ky-Fan(k) norm in two-sample covariance testing. Ultimately, we propose a novel methodology, Rank-Adaptive Covariance Testing (RACT), which is able to leverage differences in low-rank structures found in the covariance matrices of two groups in order to maximize power. RACT uses permutation for statistical inference, ensuring an exact Type I error control. We validate RACT in simulation studies and evaluate its performance when testing for differences in gene expression networks between two types of lung cancer, as well as testing for covariance heterogeneity in diffusion tensor imaging (DTI) data taken on two different scanner types.
Methodology,Statistics Theory,Applications
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem that when there are low - rank structural differences in the covariance matrices of two groups of samples in biomedical research, the existing two - sample covariance test methods may lose their effectiveness. Specifically: 1. **Background and motivation**: - In biomedical research, comparing the covariance differences between two groups of samples can provide scientific insights beyond mean differences, especially when the differences are driven by the complex joint behavior among features. - For high - dimensional data such as genomics and neuroimaging, the low - rank structure of the covariance matrix (i.e., the main information is concentrated on a few principal components) is a common phenomenon. - Existing two - sample covariance test methods (such as those based on the Frobenius norm or the trace) perform poorly when dealing with low - rank structures and may lead to a decrease in test power. 2. **Application of Ky - Fan(k) norm**: - The Ky - Fan(k) norm is defined as the sum of the first \( k \) largest singular values of a matrix and can effectively capture the differences between low - rank structures. - However, the statistical properties of the Ky - Fan(k) norm in hypothesis testing have not been fully studied. 3. **The proposed new method**: - The paper proposes a new method named Rank - Adaptive Covariance Testing (RACT), which maximizes the test power by adaptively selecting an appropriate \( k \) value. - RACT uses the permutation method for statistical inference to ensure strict control of the Type I error rate in the case of a finite sample. 4. **Specific applications**: - The authors verified the performance of RACT through simulation experiments and applied it to two actual data sets: - Comparing the differences in gene expression networks between two types of lung cancer (lung squamous cell carcinoma LUSC and lung adenocarcinoma LUAD). - Testing the covariance heterogeneity between diffusion tensor imaging (DTI) data obtained using different scanners. ### Summary The main objective of the paper is to develop a new two - sample covariance test method, RACT, to deal with the low - rank structural difference problems common in genomics and neuroimaging data, thereby improving the statistical power of the test.