Unsupervised pattern identification in spatial gene expression atlas reveals mouse brain regions beyond established ontology

Robert Cahill,Yu Wang,R Patrick Xian,Alex J Lee,Hongkui Zeng,Bin Yu,Bosiljka Tasic,Reza Abbasi-Asl
DOI: https://doi.org/10.1073/pnas.2319804121
2024-09-10
Abstract:The rapid growth of large-scale spatial gene expression data demands efficient and reliable computational tools to extract major trends of gene expression in their native spatial context. Here, we used stability-driven unsupervised learning (i.e., staNMF) to identify principal patterns (PPs) of 3D gene expression profiles and understand spatial gene distribution and anatomical localization at the whole mouse brain level. Our subsequent spatial correlation analysis systematically compared the PPs to known anatomical regions and ontology from the Allen Mouse Brain Atlas using spatial neighborhoods. We demonstrate that our stable and spatially coherent PPs, whose linear combinations accurately approximate the spatial gene data, are highly correlated with combinations of expert-annotated brain regions. These PPs yield a brain ontology based purely on spatial gene expression. Our PP identification approach outperforms principal component analysis and typical clustering algorithms on the same task. Moreover, we show that the stable PPs reveal marked regional imbalance of brainwide genetic architecture, leading to region-specific marker genes and gene coexpression networks. Our findings highlight the advantages of stability-driven machine learning for plausible biological discovery from dense spatial gene expression data, streamlining tasks that are infeasible by conventional manual approaches.
What problem does this paper attempt to address?