SpaRED benchmark: Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion

Gabriel Mejia,Daniela Ruiz,Paula Cárdenas,Leonardo Manrique,Daniela Vega,Pablo Arbeláez
2024-09-28
Abstract:Spatial Transcriptomics is a novel technology that aligns histology images with spatially resolved gene expression profiles. Although groundbreaking, it struggles with gene capture yielding high corruption in acquired data. Given potential applications, recent efforts have focused on predicting transcriptomic profiles solely from histology images. However, differences in databases, preprocessing techniques, and training hyperparameters hinder a fair comparison between methods. To address these challenges, we present a systematically curated and processed database collected from 26 public sources, representing an 8.6-fold increase compared to previous works. Additionally, we propose a state-of-the-art transformer based completion technique for inferring missing gene expression, which significantly boosts the performance of transcriptomic profile predictions across all datasets. Altogether, our contributions constitute the most comprehensive benchmark of gene expression prediction from histology images to date and a stepping stone for future research on spatial transcriptomics.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem this paper attempts to address is the issue of data missingness in gene expression prediction using Spatial Transcriptomics (ST) technology. Specifically, the paper focuses on how to more accurately predict gene expression profiles from histological images, especially in cases where there is a high proportion of missing data. To tackle these issues, the authors propose the following contributions: 1. **Construction of the SpaRED Database**: Systematically collected, organized, and standardized data from 26 public sources, covering 9 different tissue types from humans and mice. The data volume is increased by 8.6 times compared to previous works. 2. **Proposing the SpaCKLE Model**: Designed a completion technique based on the Transformer to infer missing gene expression values. This model not only excels in data completion tasks but also significantly improves the performance of all existing methods in gene expression prediction tasks. 3. **Establishing Benchmarks**: Evaluated 7 state-of-the-art gene expression prediction methods and established new benchmarks using the SpaRED database, demonstrating that the performance of prediction models is significantly enhanced after data completion with SpaCKLE. Overall, the paper aims to advance research in the field of gene expression prediction in spatial transcriptomics by constructing a high-quality database and innovative data completion techniques.