Inference for linear functionals of high-dimensional longitudinal proteomics data using generalized estimating equations

Lu Xia,Ali Shojaie
DOI: https://doi.org/10.48550/arXiv.2207.11686
2024-09-03
Abstract:Regression analysis of correlated data, where multiple correlated responses are recorded on the same unit, is ubiquitous in many scientific areas. With the advent of new technologies, in particular high-throughput omics profiling assays, such correlated data increasingly consist of large number of variables compared with the available sample size. Motivated by recent longitudinal proteomics studies of COVID-19, we propose a novel inference procedure for linear functionals of high-dimensional regression coefficients in generalized estimating equations, which are widely used to analyze correlated data. Our estimator for this more general inferential target, obtained via constructing projected estimating equations, is shown to be asymptotically normally distributed under mild regularity conditions. We also introduce a data-driven cross-validation procedure to select the tuning parameter for estimating the projection direction, which is not addressed in the existing procedures. We illustrate the utility of the proposed procedure in providing confidence intervals for associations of individual proteins and severe COVID risk scores obtained based on high-dimensional proteomics data, and demonstrate its robust finite-sample performance, especially in estimation bias and confidence interval coverage, via extensive simulations.
Methodology
What problem does this paper attempt to address?