Abstract:The human brain is a complex interconnected structure controlling all elementary and high-level cognitive tasks. It is composed of many regions that exhibit specific distributions of cell types and distinct patterns of functional connections. This complexity is rooted in differential transcription. The constituent cell types of different brain regions express distinctive combinations of genes as they develop and mature, ultimately shaping their functional state in adulthood. How precisely the genetic information of anatomical structures is connected to their underlying biological functions remains an open question in modern neuroscience. A major challenge is the identification of universal patterns, which do not depend on the particular individual, but are instead basic structural properties shared by all brains. Despite the vast amount of gene expression data available at both the bulk and single-cell levels, this task remains challenging, mainly due to the lack of suitable data mining tools. In this paper, we propose an approach to address this issue based on a hierarchical version of Stochastic Block Modeling. Thanks to its specific choice of priors, the method is particularly effective in identifying these universal features. We use as a laboratory to test our algorithm a dataset obtained from six independent human brains from the Allen Human Brain Atlas. We show that the proposed method is indeed able to identify universal patterns much better than more traditional algorithms such as Latent Dirichlet Allocation or Weighted Correlation Network Analysis. The probabilistic association between genes and samples that we find well represents the known anatomical and functional brain organization. Moreover, leveraging the peculiar fuzzy structure of the gene sets obtained with our method, we identify examples of transcriptional and post-transcriptional pathways associated with specific brain regions, highlighting the potential of our approach.

Exploring the latent space of transcriptomic data with topic modeling

Latent Space Inference For Spatial Transcriptomics

Topic Modeling analysis of the Allen Human Brain Atlas

Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach

A bayesian multivariate mixture model for high throughput spatial transcriptomics

Decoding Spatial Tissue Architecture: A Scalable Bayesian Topic Model for Multiplexed Imaging Analysis

Spatial transcriptomics inferred from pathology whole-slide images links tumor heterogeneity to survival in breast and lung cancer

Multilayer modelling of the human transcriptome and biological mechanisms of complex diseases and traits

Spatial components of molecular tissue biology

Bayesian Flexible Modelling of Spatially Resolved Transcriptomic Data

Towards the Latent Transcriptome

Exploratory Projection to Latent Structure Models for use in Transcriptomic Analysis

Machine Learning for Uncovering Biological Insights in Spatial Transcriptomics Data

Mixed membership analysis of genome-wide expression data

SpaTopic: A statistical learning framework for exploring tumor spatial architecture from spatially resolved transcriptomic data

Gene Expression based Survival Prediction for Cancer Patients: A Topic Modeling Approach

Latent Space Cartography: Visual Analysis of Vector Space Embeddings

Interpretable spatially aware dimension reduction of spatial transcriptomics with STAMP

An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data

Spatial transcriptomics at subspot resolution with BayesSpace

High-dimensional Bayesian Model for Disease-Specific Gene Detection in Spatial Transcriptomics