Mixture Models for Single Cell Assays with Applications to Vaccine Studies

Greg Finak,Andrew McDavid,Pratip Chattopadhyay,Maria Dominguez,Steve De Rosa,Mario Roederer,Raphael Gottardo
DOI: https://doi.org/10.48550/arXiv.1208.5809
2012-08-31
Abstract:In immunological studies, the characterization of small, functionally distinct cell subsets from blood and tissue is crucial to decipher system level biological changes. An increasing number of studies rely on assays that provide single-cell measurements of multiple genes and proteins from bulk cell samples. A common problem in the analysis of such data is to identify biomarkers (or combinations of thereof) that are differentially expressed between two biological conditions (e.g., before/after vaccination), where expression is defined as the proportion of cells expressing the biomarker or combination in the cell subset of interest. Here, we present a Bayesian hierarchical framework based on a beta-binomial mixture model for testing for differential biomarker expression using single-cell assays. Our model allows inference to be subject specific, as is typically required when accessing vaccine responses, while borrowing strength across subjects through common prior distributions. We propose two approaches for parameter estimation: an empirical-Bayes approach using an Expectation-Maximization algorithm and a fully Bayesian one based on a Markov chain Monte Carlo algorithm. We compare our method against frequentist approaches for single-cell assays including Fisher's exact test, a likelihood ratio test, and basic log-fold changes. Using several experimental assays measuring proteins or genes at the single-cell level and simulated data, we show that our method has higher sensitivity and specificity than alternative methods. Additional simulations show that our framework is also robust to model misspecification. Finally, we also demonstrate how our approach can be extended to testing multivariate differential expression across multiple biomarker combinations using a Dirichlet-multinomial model and illustrate this multivariate approach using single-cell gene expression data and simulations.
Applications,Methodology
What problem does this paper attempt to address?