Abstract:High-throughput, spatially resolved gene expression techniques are poised to be transformative across biology by overcoming a central limitation in single-cell biology: the lack of information on relationships that organize the cells into the functional groupings characteristic of tissues in complex multicellular organisms. Spatial expression is particularly interesting in the mammalian brain, which has a highly defined structure, strong spatial constraint in its organization, and detailed multimodal phenotypes for cells and ensembles of cells that can be linked to mesoscale properties such as projection patterns, and from there, to circuits generating behavior. However, as with any type of expression data, cross-dataset benchmarking of spatial data is a crucial first step. Here, we assess the replicability, with reference to canonical brain subdivisions, between the Allen Institute’s in situ hybridization data from the adult mouse brain (Allen Brain Atlas (ABA)) and a similar dataset collected using spatial transcriptomics (ST). With the advent of tractable spatial techniques, for the first time, we are able to benchmark the Allen Institute’s whole-brain, whole-transcriptome spatial expression dataset with a second independent dataset that similarly spans the whole brain and transcriptome. We use regularized linear regression (LASSO), linear regression, and correlation-based feature selection in a supervised learning framework to classify expression samples relative to their assayed location. We show that Allen Reference Atlas labels are classifiable using transcription in both data sets, but that performance is higher in the ABA than in ST. Furthermore, models trained in one dataset and tested in the opposite dataset do not reproduce classification performance bidirectionally. While an identifying expression profile can be found for a given brain area, it does not generalize to the opposite dataset. In general, we found that canonical brain area labels are classifiable in gene expression space within dataset and that our observed performance is not merely reflecting physical distance in the brain. However, we also show that cross-platform classification is not robust. Emerging spatial datasets from the mouse brain will allow further characterization of cross-dataset replicability ultimately providing a valuable reference set for understanding the cell biology of the brain.

Statistical Testing in Transcriptomic‐neuroimaging Studies: A How‐to and Evaluation of Methods Assessing Spatial and Gene Specificity

Statistical testing and annotation of gene transcriptomic-neuroimaging associations

Spanve: an Statistical Method to Detect Clustering-friendly Spatially Variable Genes in Large-scale Spatial Transcriptomics Data

Bayesian hidden mark interaction model for detecting spatially variable genes in imaging-based spatially resolved transcriptomics data

Unraveling the molecular relevance of brain phenotypes: A comparative analysis of null models and test statistics

Statistical Power Analysis for Designing Bulk, Single-Cell, and Spatial Transcriptomics Experiments: Review, Tutorial, and Perspectives

Statistical Testing of Shared Genetic Control for Potentially Related Traits

Assessing the replicability of spatial gene expression using atlas data from the adult mouse brain

Of mice and men: Sparse statistical modeling in cardiovascular genomics

Site effects how-to and when: An overview of retrospective techniques to accommodate site effects in multi-site neuroimaging analyses

Benchmarking Computational Integration Methods for Spatial Transcriptomics Data

A Statistical Approach for Detecting Common Features

Statistical and machine learning methods for spatially resolved transcriptomics data analysis

A Bayesian modified Ising model for identifying spatially variable genes from spatial transcriptomics data

Increasing Power for Voxel-Wise Genome-Wide Association Studies: the Random Field Theory, Least Square Kernel Machines and Fast Permutation Procedures.

Statistical significance of variables driving systematic variation

Multiple Comparison Procedures for Neuroimaging Genomewide Association Studies

The impact of heterogeneous spatial autocorrelation on comparisons of brain maps

High-dimensional Bayesian Model for Disease-Specific Gene Detection in Spatial Transcriptomics

Statistical testing and power analysis for brain-wide association study

A robust statistical approach for finding informative spatially associated pathways