Breast Cancer Histopathology Image based Gene Expression Prediction using Spatial Transcriptomics data and Deep Learning
Md Mamunur Rahaman,Ewan K. A. Millar,Erik Meijering
DOI: https://doi.org/10.1038/s41598-023-40219-0
2023-03-17
Abstract:Tumour heterogeneity in breast cancer poses challenges in predicting outcome and response to therapy. Spatial transcriptomics technologies may address these challenges, as they provide a wealth of information about gene expression at the cell level, but they are expensive, hindering their use in large-scale clinical oncology studies. Predicting gene expression from hematoxylin and eosin stained histology images provides a more affordable alternative for such studies. Here we present BrST-Net, a deep learning framework for predicting gene expression from histopathology images using spatial transcriptomics data. Using this framework, we trained and evaluated 10 state-of-the-art deep learning models without utilizing pretrained weights for the prediction of 250 genes. To enhance the generalisation performance of the main network, we introduce an auxiliary network into the framework. Our methodology outperforms previous studies, with 237 genes identified with positive correlation, including 24 genes with a median correlation coefficient greater than 0.50. This is a notable improvement over previous studies, which could predict only 102 genes with positive correlation, with the highest correlation values ranging from 0.29 to 0.34.
Image and Video Processing,Computer Vision and Pattern Recognition,Genomics
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve
This paper aims to address the predictive challenges posed by breast cancer tumor heterogeneity. Specifically, tumor heterogeneity makes it difficult to predict patients' treatment responses and prognoses. Although spatial transcriptomics technologies can provide rich cell-level gene expression information, these technologies are expensive and difficult to widely apply in large-scale clinical studies. Therefore, the authors propose a deep learning-based method to predict gene expression from Hematoxylin and Eosin (H&E) stained histopathological images, offering a more cost-effective alternative.
### Specific Objectives
1. **Develop the BrST-Net Framework**: This framework combines spatial transcriptomics data and deep learning models to predict gene expression from histopathological images.
2. **Improve Prediction Accuracy**: By introducing an auxiliary network to enhance the generalization performance of the main network, thereby improving the accuracy of gene expression prediction.
3. **Validate Model Performance**: Train and evaluate 10 state-of-the-art deep learning models to predict the expression levels of 250 genes and compare them with existing methods.
### Major Contributions
1. **Innovative Deep Learning Framework**: The BrST-Net framework not only predicts gene expression but also improves the model's generalization ability through an auxiliary network.
2. **Significant Performance Improvement**: Compared to existing methods, BrST-Net can predict the expression of more genes with higher accuracy. Specifically, BrST-Net can predict the positive correlated expression of 237 genes, with the median correlation coefficient of 24 genes being greater than 0.50.
3. **Practical Application Potential**: This method can predict gene expression using conventional H&E stained images without relying on expensive spatial transcriptomics technologies, offering high clinical application value.
### Conclusion
Through the BrST-Net framework, the authors successfully predicted the expression of a large number of genes from H&E stained histopathological images, providing new tools and methods for the diagnosis, classification, and treatment of breast cancer. This study not only improves the accuracy of gene expression prediction but also lays the foundation for future large-scale clinical applications.