Robust identification of temporal biomarkers in longitudinal omics studies

Ahmed A Metwally,Tom Zhang,Si Wu,Ryan Kellogg,Wenyu Zhou,Kevin Contrepois,Hua Tang,Michael Snyder
DOI: https://doi.org/10.1093/bioinformatics/btac403
IF: 5.8
2022-08-02
Bioinformatics
Abstract:Motivation: Longitudinal studies increasingly collect rich 'omics' data sampled frequently over time and across large cohorts to capture dynamic health fluctuations and disease transitions. However, the generation of longitudinal omics data has preceded the development of analysis tools that can efficiently extract insights from such data. In particular, there is a need for statistical frameworks that can identify not only which omics features are differentially regulated between groups but also over what time intervals. Additionally, longitudinal omics data may have inconsistencies, including non-uniform sampling intervals, missing data points, subject dropout and differing numbers of samples per subject. Results: In this work, we developed OmicsLonDA, a statistical method that provides robust identification of time intervals of temporal omics biomarkers. OmicsLonDA is based on a semi-parametric approach, in which we use smoothing splines to model longitudinal data and infer significant time intervals of omics features based on an empirical distribution constructed through a permutation procedure. We benchmarked OmicsLonDA on five simulated datasets with diverse temporal patterns, and the method showed specificity greater than 0.99 and sensitivity greater than 0.87. Applying OmicsLonDA to the iPOP cohort revealed temporal patterns of genes, proteins, metabolites and microbes that are differentially regulated in male versus female subjects following a respiratory infection. In addition, we applied OmicsLonDA to a longitudinal multi-omics dataset of pregnant women with and without preeclampsia, and OmicsLonDA identified potential lipid markers that are temporally significantly different between the two groups. Availability and implementation: We provide an open-source R package (https://bioconductor.org/packages/OmicsLonDA), to enable widespread use. Supplementary information: Supplementary data are available at Bioinformatics online.
What problem does this paper attempt to address?