Multiple two-sample testing under arbitrary covariance dependency with an application in imaging mass spectrometry

Vladimir Vutov,Thorsten Dickhaus
DOI: https://doi.org/10.48550/arXiv.2108.08123
2021-08-18
Abstract:Large-scale hypothesis testing has become a ubiquitous problem in high-dimensional statistical inference, with broad applications in various scienfitic disciplines. One relevant application is constituted by imaging mass spectrometry (IMS) association studies, where a large number of tests are performed simultaneously in order to identify molecular masses that are associated with a particular phenotype, e. g., a cancer subtype. Mass spectra obtained from Matrix-assisted laser desorption/ionization (MALDI) experiments are dependent, when considered as statistical quantities. False discovery proportion (FDP) control under arbitrary dependency structure among test statistics is an active topic in modern multiple testing research. In this context, we are concerned with the evaluation of associations between the binary outcome variable (describing the phenotype) and multiple predictors derived from MALDI measurements. We propose an inference procedure in which the correlation matrix of the test statistics is utilized. The approach is based on multiple marginal models (MMM). Specifically, we fit a marginal logistic regression model for each predictor individually. Asymptotic joint normality of the stacked vector of the marginal regression coefficients is established under standard regularity assumptions, and their (limiting) correlation matrix is estimated. The proposed method extracts common factors from the resulting empirical correlation matrix. Finally, we estimate the realized FDP of a thresholding procedure for the marginal $p$-values. We demonstrate a practical application of the proposed workflow to MALDI IMS data in an oncological context.
Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to conduct multi - sample hypothesis testing under any covariance - dependence structure, especially its application in Imaging Mass Spectrometry (IMS). Specifically, researchers are concerned with controlling the False Discovery Proportion (FDP) in a large number of hypothesis tests. In particular, when analyzing Matrix - Assisted Laser Desorption/Ionization (MALDI) mass spectrometry data, how to effectively identify molecular masses related to specific phenotypes (such as a certain cancer subtype). The core problem of the paper is to propose a new statistical method. This method can use the correlation matrix of test statistics to improve the efficiency of multiple tests under the premise of considering the dependence between variables. This method is based on Multiple Marginal Models (MMM) and is achieved by fitting the marginal logistic regression model for each predictor variable. In addition, the paper also proposes a method for estimating the actual FDP and applies it to an actual case of MALDI - IMS data to verify the effectiveness of the method.