Optimal Estimation of Simultaneous Signals Using Absolute Inner Product with Applications to Integrative Genomics

Rong Ma,T. Tony Cai,Hongzhe Li
DOI: https://doi.org/10.5705/ss.202019.0445
2020-10-05
Abstract:Integrating the summary statistics from genome-wide association study (\textsc{gwas}) and expression quantitative trait loci (e\textsc{qtl}) data provides a powerful way of identifying the genes whose expression levels are potentially associated with complex diseases. A parameter called $T$-score that quantifies the genetic overlap between a gene and the disease phenotype based on the summary statistics is introduced based on the mean values of two Gaussian sequences. Specifically, given two independent samples $\mathbf{x}_n\sim N(\theta, \Sigma_1)$ and $\mathbf{y}_n\sim N(\mu, \Sigma_2)$, the $T$-score is defined as $\sum_{i=1}^n |\theta_i\mu_i|$, a non-smooth functional, which characterizes the amount of shared signals between two absolute normal mean vectors $|\theta|$ and $|\mu|$. Using approximation theory, estimators are constructed and shown to be minimax rate-optimal and adaptive over various parameter spaces. Simulation studies demonstrate the superiority of the proposed estimators over existing methods. The method is applied to an integrative analysis of heart failure genomics datasets and we identify several genes and biological pathways that are potentially causal to human heart failure.
Methodology,Statistics Theory
What problem does this paper attempt to address?