Topic Modeling analysis of the Allen Human Brain Atlas

Letizia Pizzini,Filippo Valle,Matteo Osella,Michele Caselle
DOI: https://doi.org/10.1101/2024.10.11.617855
2024-10-13
Abstract:The human brain is a complex interconnected structure controlling all elementary and high-level cognitive tasks. It is composed of many regions that exhibit specific distributions of cell types and distinct patterns of functional connections. This complexity is rooted in differential transcription. The constituent cell types of different brain regions express distinctive combinations of genes as they develop and mature, ultimately shaping their functional state in adulthood. How precisely the genetic information of anatomical structures is connected to their underlying biological functions remains an open question in modern neuroscience. A major challenge is the identification of universal patterns, which do not depend on the particular individual, but are instead basic structural properties shared by all brains. Despite the vast amount of gene expression data available at both the bulk and single-cell levels, this task remains challenging, mainly due to the lack of suitable data mining tools. In this paper, we propose an approach to address this issue based on a hierarchical version of Stochastic Block Modeling. Thanks to its specific choice of priors, the method is particularly effective in identifying these universal features. We use as a laboratory to test our algorithm a dataset obtained from six independent human brains from the Allen Human Brain Atlas. We show that the proposed method is indeed able to identify universal patterns much better than more traditional algorithms such as Latent Dirichlet Allocation or Weighted Correlation Network Analysis. The probabilistic association between genes and samples that we find well represents the known anatomical and functional brain organization. Moreover, leveraging the peculiar fuzzy structure of the gene sets obtained with our method, we identify examples of transcriptional and post-transcriptional pathways associated with specific brain regions, highlighting the potential of our approach.
Biophysics
What problem does this paper attempt to address?