TWO-SIGMA: A Novel Two-Component Single Cell Model-Based Association Method for Single-Cell RNA-seq Data

Eric Van Buren,Ming Hu,Chen Weng,Fulai Jin,Yan Li,Di Wu,Yun Li
DOI: https://doi.org/10.1002/gepi.22361
2021-01-01
Genetic Epidemiology
Abstract:Two key challenges in the analysis of single cell RNA-seq (scRNA-seq) data are excess zeros due to “drop-out” events and substantial overdispersion due to stochastic and systematic differences. Association analysis of scRNA-seq data is further confronted with the possible dependency introduced by measuring multiple single cells from the same biological sample. To address these three challenges, we propose TWO-SIGMA: a TWO-component SInGle cell Model-based Association method. The first component models the drop-out probability with a mixed-effects logistic regression, and the second component models the (conditional) mean read count with a mixed-effects negative binomial regression. Our approach simultaneously allows for overdispersion and accommodates dependency in both drop-out probability and mean mRNA abundance at the gene level, leading to improved statistical power while still providing highly interpretable coefficient estimates. Simulation studies and real data analysis show advantages in type-I error control, power enhancement, and parameter estimation over alternative approaches including MAST and a zero-inflated negative binomial model without random effects. TWO-SIGMA is implemented in the R package “twosigma” available at https://github.com/edvanburen/twosigma.
What problem does this paper attempt to address?