Abstract:Background: PAM50, a 50-gene signature, classifies breast cancers into one of five subtypes (basal, luminal A, luminal B, HER2-enriched, and normal-like), revealing information about underlying tumor biology, and has emerged as a key prognostic indicator influencing treatment decisions. There is growing interest in bridging the gap between expression-based metrics and histopathology, where immunohistochemistry (IHC) and sequencing-based approaches have been proposed for this purpose. However, hematoxylin and eosin (H&E)-stained slides are ubiquitously utilized by pathologists for cancer diagnosis, while IHC and sequencing-based approaches require additional tissue and specialized processing and/or analysis. Here, we describe a computer vision-based approach to predict PAM50 classification using H&E-stained whole slide images (WSIs). Methods: We obtained expression-based PAM50 subtype labels and corresponding H&E-stained WSIs for 961 breast carcinomas from the TCGA BRCA cohort. We used two separate machine learning (ML) approaches to predict PAM50 subtypes from WSIs. In the first approach, we deployed previously trained PathExplore models to extract quantitative human-interpretable features (HIFs) that summarize the TME. We subsequently trained random forest classification models on these HIFs to predict PAM50 subtypes. For the second approach, we developed additive multiple instance learning (aMIL) models. Additionally, we explored the effects of PAM50 subtype labeling and aggregation strategies beyond the 5-class approach. Our 3-class approach combines Luminal A and B, as seen in IHC efforts to increase agreement with PAM50 assays, while excluding Normal, a category containing few and heterogeneous samples. We also performed binary classification for each subtype in the 3-class model (e.g. luminal vs. other). Slides were split into training (60%), validation (20%), and test (20%) sets, stratified by PAM50 labels, and model performance was assessed using the area under the receiver operator curve (AUROC) metric on the held-out test set, using a one vs. rest approach for multi-class models. To establish a baseline for PAM50 prediction, we developed random forest classification models using only clinical covariates (tumor stage, histologic grade, histological subtype, and BRCA1/2 status). Results: We compared the performance of our two ML models (HIF and aMIL) to that of the baseline model, and we report the AUROC values in Table 1. These models both performed well in predicting Basal, Luminal A, Luminal B, and Luminal (A+B), while the model performance was less strong for predictions of the HER2 and Normal classifications. The three-class model showed improved performance of predicting Luminal classifications relative to the five-class model that separates Luminal A and B. Although simplifying classification problems to a binary use case typically provides improved performance, this phenomenon was not observed for any of the PAM50 subtypes. Conclusions: These results demonstrate that AI-powered digital pathology can accurately and reproducibly perform molecular-based classification tasks, such as predicting PAM50 classifications, using WSIs, suggesting a more efficient path toward clinically relevant breast cancer characterization. Table 1. Performance of all models in predicting PAM50 molecular subtypes. AUROC values are shown. Shaded cells represent the best test-set performance for each class (row). Citation Format: Maria Guramare, Syed Ashar Javed, Christian Kirkup, Dinkar Juyal, Jacqueline Brosnan-Cashman, Victoria Mountain, Ryan Leung, Bahar Rahsepar, John Abel, Amaro Taylor-Weiner, Jake Conway. Prediction of PAM50 molecular subtypes from H&E-stained breast cancer specimens using tumor microenvironment features and additive multiple instance learning models [abstract]. In: Proceedings of the 2023 San Antonio Breast Cancer Symposium; 2023 Dec 5-9; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2024;84(9 Suppl) nr PO3-07-04.

Abstract 5408: AI powered-platform to predict gene modifications from prostate and breast cancer whole slide images

Abstract 462: Using attention-based deep multiple instance learning to identify key genetic alterations in prostate cancer from whole slide images

Abstract 7669: Use of computational pathology to predict biochemical recurrence in prostate cancer (Pca) (Rp) patients following radical prostatectomy: Evaluation in low decipher risk category

A novel, AI-generated morphologic biomarker to predict prostate cancer recurrence in patients with intermediate risk of progression.

Abstract 4938: Prediction of gene mutation from colorectal adenocarcinoma whole slide images via integrated deep learning pipeline

Abstract 4970: Multi-modal machine learning approaches for predicting cancer type and Gleason grade leveraging public TCGA data

Abstract PO3-07-04: Prediction of PAM50 molecular subtypes from H&E-stained breast cancer specimens using tumor microenvironment features and additive multiple instance learning models

Abstract 4298: Predicting tumor evolution from digital histology using AI

Abstract 5354: Prediction of OncotypeDX high risk group for chemotherapy benefit in breast cancer by deep learning analysis of hematoxylin and eosin-stained whole slide images

Abstract 5436: Developing artificial intelligence algorithms to predict response to neoadjuvant chemotherapy in HER2-positive breast cancer

Semi-Supervised, Attention-Based Deep Learning for Predicting TMPRSS2:ERG Fusion Status in Prostate Cancer Using Whole Slide Images

Abstract 3521: Robust inference of PTEN deletion in prostate cancer from routine histopathology whole slide images using attention-based deep learning

Abstract B092: Development of predictive models for expression of a tumor specific biomarker and CD3 on H&E digital slides

Abstract PO4-01-10: Multi-modal artificial intelligence models from baseline histopathology predict prognosis in HR+ HER2- early breast cancer

Abstract 5058: Recurrence risk prediction based on automatic histopathologic analysis of breast cancer using whole slide images

Abstract A037: Predicting pancreatic cancer using artificial intelligence analysis of pancreatic subregions using computed tomography images

Artificial intelligence in digital histopathology for predicting patient prognosis and treatment efficacy in breast cancer

Abstract PO3-07-05: Multi-site validation of a deep learning solution for ER/PR profiling of breast cancer from H&E-stained pathology slides

Transcriptome-wide prediction of prostate cancer gene expression from histopathology images using co-expression based convolutional neural networks

Comparison of whole slide image–based deep learning algorithms and genomic classifiers for assessing the risk of prostate cancer metastasis in surgically treated patients.

An Integrated Digital Pathology Platform for Tumors Using Artificial Intelligence Analysis