ARIC: accurate and robust inference of cell type proportions from bulk gene expression or DNA methylation data

Wei Zhang,Hanwen Xu,Rong Qiao,Bixi Zhong,Xianglin Zhang,Jin Gu,Xuegong Zhang,Lei Wei,Xiaowo Wang
DOI: https://doi.org/10.1093/bib/bbab362
IF: 9.5
2021-09-01
Briefings in Bioinformatics
Abstract:Abstract Quantifying cell proportions, especially for rare cell types in some scenarios, is of great value in tracking signals associated with certain phenotypes or diseases. Although some methods have been proposed to infer cell proportions from multicomponent bulk data, they are substantially less effective for estimating the proportions of rare cell types which are highly sensitive to feature outliers and collinearity. Here we proposed a new deconvolution algorithm named ARIC to estimate cell type proportions from gene expression or DNA methylation data. ARIC employs a novel two-step marker selection strategy, including collinear feature elimination based on the component-wise condition number and adaptive removal of outlier markers. This strategy can systematically obtain effective markers for weighted $\upsilon$-support vector regression to ensure a robust and precise rare proportion prediction. We showed that ARIC can accurately estimate fractions in both DNA methylation and gene expression data from different experiments. We further applied ARIC to the survival prediction of ovarian cancer and the condition monitoring of chronic kidney disease, and the results demonstrate the high accuracy and robustness as well as clinical potentials of ARIC. Taken together, ARIC is a promising tool to solve the deconvolution problem of bulk data where rare components are of vital importance.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?