Accurate sample deconvolution of pooled snRNA-seq using sex-dependent gene expression patterns

Guy M Twa,Robert A Phillips III,Nathaniel J Robinson,Jeremy J Day
DOI: https://doi.org/10.1101/2024.11.29.626066
2024-12-03
Abstract:Single nucleus RNA sequencing (snRNA-seq) technology offers unprecedented resolution for studying cell type-specific gene expression patterns. However, snRNA-seq poses high costs and technical limitations, often requiring the pooling of independent biological samples and loss of individual sample-level data. Deconvolution of sample identity using inherent features would enable the incorporation of pooled barcoding and sequencing protocols, thereby increasing data throughput and analytical sample size without requiring increases in experimental sample size and sequencing costs. In this study, we demonstrate a proof of concept that sex-dependent gene expression patterns can be leveraged for the deconvolution of pooled snRNA-seq data. Using previously published snRNA-seq data from the rat ventral tegmental area, we trained a range of machine learning models to classify cell sex using genes differentially expressed in cells from male and female rats. Models that used sex-dependent gene expression predicted cell sex with high accuracy (90-92%) and outperformed simple classification models using only sex chromosome gene expression (69-89%). The generalizability of these models to other brain regions was assessed using an additional published data set from the rat nucleus accumbens. Within this data set, model performance remained highly accurate in cell sex classification (89-90% accuracy) with no additional re-training. This work provides a model for future snRNA-seq studies to perform sample deconvolution using a two-sex pooled sample sequencing design and benchmarks the performance of various machine learning approaches to deconvolve sample identification from inherent sample features.
Genomics
What problem does this paper attempt to address?