FastMix: A Versatile Data Integration Pipeline for Cell Type-Specific Biomarker Inference

Yun Zhang,Hao Sun,Aishwarya Mandava,Brian D Aevermann,Tobias R Kollmann,Richard H Scheuermann,Xing Qiu,Yu Qian
DOI: https://doi.org/10.1093/bioinformatics/btac585
IF: 5.8
2022-08-27
Bioinformatics
Abstract:Motivation Flow cytometry (FCM) and transcription profiling are the two widely used assays in translational immunology research. However, there is no data integration pipeline for analyzing these two types of assays together with experiment variables for biomarker inference. Current FCM data analysis mainly relies on subjective manual gating analysis, which is difficult to be directly integrated with other automated computational methods. Existing deconvolutional analysis of bulk transcriptomics relies on predefined marker genes in the transcriptomics data, which are unavailable for novel cell types and does not utilize the FCM data that provide canonical phenotypic definitions of the cell types. Results We developed a novel analytics pipeline - FastMix - for computational immunology, which integrates flow cytometry, bulk transcriptomics, and clinical covariates for identifying cell type-specific gene expression signatures and biomarker genes. FastMix addresses the "large p, small n" problem in the gene expression and flow cytometry integration analysis via a linear mixed effects model (LMER) for both cross-sectional and longitudinal studies. Its novel moment-based estimator not only reduces bias in parameter estimation but also is more efficient than iterative optimization. The FastMix pipeline also includes a cutting-edge flow cytometry data analysis method - DAFi - for identifying cell populations of interest and their characteristics. Simulation studies showed that FastMix produced smaller type I/II errors than competing methods. Validation using real data of two vaccine studies showed that FastMix identified a consistent set of signature genes as in independent single cell RNA-seq analysis, producing additional interesting findings. Availability Source code of FastMix is publicly available at https://github.com/terrysun0302/FastMix. Supplementary information Supplementary text and data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?