Building gene regulatory networks from scATAC-seq and scRNA-seq using Linked Self Organizing Maps
Camden Jansen,Ricardo N. Ramirez,Nicole C. El-Ali,David Gomez-Cabrero,Jesper Tegner,Matthias Merkenschlager,Ana Conesa,Ali Mortazavi
DOI: https://doi.org/10.1371/journal.pcbi.1006555
2019-11-04
PLoS Computational Biology
Abstract:Rapid advances in single-cell assays have outpaced methods for analysis of those data types. Different single-cell assays show extensive variation in sensitivity and signal to noise levels. In particular, scATAC-seq generates extremely sparse and noisy datasets. Existing methods developed to analyze this data require cells amenable to pseudo-time analysis or require datasets with drastically different cell-types. We describe a novel approach using self-organizing maps (SOM) to link scATAC-seq regions with scRNA-seq genes that overcomes these challenges and can generate draft regulatory networks. Our SOMatic package generates chromatin and gene expression SOMs separately and combines them using a linking function. We applied SOMatic on a mouse pre-B cell differentiation time-course using controlled Ikaros over-expression to recover gene ontology enrichments, identify motifs in genomic regions showing similar single-cell profiles, and generate a gene regulatory network that both recovers known interactions and predicts new Ikaros targets during the differentiation process. The ability of linked SOMs to detect emergent properties from multiple types of highly-dimensional genomic data with very different signal properties opens new avenues for integrative analysis of heterogeneous data.Gene expression is a tightly controlled process occurring in all cells during all stages of organismal life. How much and when genes are expressed is determined by gene regulatory networks (GRNs), which encode the biological programs that cells can perform. Each cell in an organism is constantly running through these networks to carry out its particular function. New techniques allow us to measure gene expression and chromatin accessibility using single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq). However, these techniques have relatively poor and different signal-to-noise ratios. In this work, we use a form of unsupervised learning called Self-Organizing Maps (SOMs) to analyze one step of B cell differentiation by linking separately trained scRNA-seq and scATAC-seq SOMs. We mine the linked SOMs to reconstruct the underlying GRN using a top-down approach. The resulting GRN not only recapitulated known regulatory linkages but also identified a large number of potential regulatory connections to the system. These methods should be generally applicable to linking heterogeneous high-throughput data with different signal-to-noise profiles.
biochemical research methods,mathematical & computational biology