Development, comparative study, and external validation of a new deep learning model for predicting genome-wide gene expression from histopathology slides.

Leon Gugel,Omer Tirosh,Yaron Kinar,Anna Michalska-Falkowska,Joanna Reszec-Gielazyn,Witold Bauer,Magdalena Niemira,Boris Temkin,Tuvik Beker,Ranit Aharonov,Gal Dinstag
DOI: https://doi.org/10.1200/jco.2024.42.16_suppl.3064
IF: 45.3
2024-06-01
Journal of Clinical Oncology
Abstract:3064 Background: In recent years, the use of tumor molecular profiling in clinical settings has enhanced cancer diagnostics, as well as the delivery of precision oncology. Recently, several methods for predicting gene expression directly from Haematoxylin-and-Eosin-stained (H&E) histology images have offered a new way to leverage the easily obtainable and cost-effective H&E images for multiple precision oncology applications. We previously introduced such a method – DeepPT – and demonstrated how we can leverage its imputed gene expression for successful prediction of drug response, through our ENLIGHT-DeepPT platform. In our previous publication and in an independent study comparing six different methods, DeepPT exhibited the best overall gene expression prediction accuracy. Methods: Here we present a new version, DeepPT 2.0, with improved architecture, including multi-task learning and a feature space based on a self-supervised pre-trained deep network. We tested both versions of DeepPT and the leading competing methods on patient data from 17 cancer subtypes included in TCGA. In addition, we obtained a new dataset collected at the Medical University of Bialystok (UMB) consisting of matched slide images and mRNA expression from 151 cases representing 7 cancer subtypes. These serve as an external validation to demonstrate that DeepPT generalizes well. Results: Our findings indicate that both versions of DeepPT show statistically significant improvement compared with the other methods in terms of median Pearson correlation of top predicted genes - the only metric available for all methods. Moreover, DeepPT 2.0 significantly improves upon version 1.0, demonstrating up to a 3-fold increase in the number of well-predicted genes (defined as genes with Pearson ρ > 0.4 between actual and predicted mRNA expressions) for 14 of the 17 cancer subtypes tested. On the external validation data from UMB, DeepPT 2.0 improves gene expression prediction in 6 of 7 of the tested cancer subtypes, with up to a 7-fold increase in the number of well-predicted genes, thus mitigating the concern of overfitting on the training set. Immune genes are particularly well predicted, as we previously observed: using a set of 826 hallmark immune genes, DeepPT 2.0 exhibits up to 3.5-fold and 2-fold increase in the percentage of well-predicted genes in the TCGA and UMB data, respectively, compared to all other genes. Conclusions: DeepPT 2.0 significantly improves upon competing methods for predicting mRNA expression from H&E slides across multiple metrics and diverse cancer types. Furthermore, it demonstrates robust generalization to slides from sources not previously seen. The method’s good ability to predict the expression of immune genes suggests a potential benefit in predicting response to immunotherapy using the ENLIGHT-DeepPT platform, as indeed demonstrated elsewhere.
oncology
What problem does this paper attempt to address?