Abstract:Background: Although several gene expression-based assays are validated for informing prognosis and treatment decision-making for breast cancer (BC) patients, their uptake has been hampered by technical complexities and cost, particularly in underrepresented and low-resource settings. Here, we explored whether machine learning-based features on standard hematoxylin and eosin (H&E)-stained images can be used in conjunction with routinely available pathology data (pathomics) to infer clinically relevant tumor genomic assays among BC patients from a sub-Saharan African population. Methods: This study comprised 563 BC patients with diagnostic H&E-stained images, clinicopathological data, and Nanostring gene expression data from the Ghana Breast Health Study (GBHS), a population-based case-control study that recruited BC patients between 2013 and 2015. H&E images were analyzed using high-accuracy machine learning algorithms to extract data on several, prospectively selected, human interpretable imaging features. The assessed features encompassed characteristics of the tumor (e.g., average nuclear size, nuclear optical density, nuclear roundness, etc.) and stroma (e.g., degree of stromal cellularity, stromal cell phenotype, extent of stromal desmoplasia, remodeling, necrosis, etc.). Nanostring technology was used to generate data on PAM50 subtype [luminal (A and B), non-luminal (HER2-enriched and Basal), normal-like], 21-gene recurrence score (RS), and TP53 pathway function. Multivariable logistic regression models were fitted to luminal (vs other subtypes), non-luminal (vs other subtypes), TP53 mutant-like (vs wildtype-like), and RS (4th vs 1st quartile) data to develop predictive classifiers in a discovery set (60% of the data). The performance of the classifiers was then tested in a held-out internal validation set (40% of the data). Results: In the discovery set, the pathomics-based classifiers achieved varying but excellent discriminatory accuracy [AUROC = 0.90 (0.86-0.94), 0.94 (0.91-0.97), 0.95 (0.91-0.98), and 0.96 (0.94-0.99) for TP53, non-luminal, RS, and luminal classification, respectively]. In the held-out (validation) set, the corresponding AUROC values were 0.82 (0.75-0.88), 0.87 (0.81-0.92), 0.85 (0.77-0.92), and 0.88 (0.83-0.93) for TP53, non-luminal, RS, and luminal classification. The TP53 classifier was correlated with the luminal (R=-0.73), non-luminal (R=0.80), and RS (R=0.77) classifiers. The distribution of the pathomics-based TP53 probability score varied considerably between PAM50 subtypes (P<0.0001), and according to RS categories within both ER+ (P=0.009) and ER- (P=0.006) BC subtypes in the validation set. Conclusion: H&Es are cost-effective and routinely performed as part of the diagnostic workup for BC patients. Accordingly, the results open promising avenues for the use of interpretable, machine learning-based, H&E imaging and pathology data to infer breast tumor genomic signatures and prognosis in low-resource settings. Further work is required to validate findings in independent populations.: Citation Format: Mustapha Abubakar, Amber N. Hurson, Thomas U. Ahearn, Ebonee N. Butler, Alina M. Hamilton, Maire A. Duggan, Scott M. Lawrence, Ernest Adjei, Joe-Nat Clegg-Lamptey, Joel Yarney, Beatrice Wiafe-Addai, Baffour Awuah, Seth Wiafe, Kofi Nyarko, Francis Aitpillah, Daniel Ansong, Stephen Hewitt, Louise A. Brinton, Melissa A. Troester, Lawrence Edusei, Nicolas Titiloye, Jonine D. Figueroa, Montserrat Garcia-Closas. Pathomics-based classifiers for inferring breast cancer genomic assays and prognosis in sub-Saharan Africa: Results from the Ghana Breast Health Study [abstract]. In: Proceedings of the 17th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2024 Sep 21-24; Los Angeles, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2024;33(9 Suppl) nr C022.

Abstract 4090: Creating research quality cancer genomic data from electronic health records

Abstract 5721: Automated annotation for large-scale clinicogenomic models of lung cancer treatment response and overall survival

Natural Language Processing to Identify Abnormal Breast, Lung, and Cervical Cancer Screening Test Results from Unstructured Reports to Support Timely Follow-up.

Abstract 5028: Integration of molecular data into cancer patient database

Abstract 2315: AI-enabled precision oncology era: Advanced and interactive interpretation of next-gneneration sequencing (NGS) reports

Abstract 3569: Using AI to automatically process data from unstructured health records of patients with lung cancer

Abstract 3892: Systematic generation of a clinicogenomic harmonized oncologic real-world dataset (MSK-CHORD)

Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system

Extraction of Unstructured Electronic Health Records to Evaluate Glioblastoma Treatment Patterns

Automated real-world data integration improves cancer outcome prediction

Abstract 4966: Machine learning and large language model approach to pancancer data elements

Abstract 2146: A quantitative in vivo pharmacogenomics platform uncovers biomarkers of therapy response

Novel approach to implementing natural language processing for clinical staging of non-small-cell lung cancer.

Abstract 755: Supporting Precision Cancer Treatment Decision with Functional Evaluation of Cancer Gene Mutations and Variants

Natural language processing for populating lung cancer clinical research data

Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes

Abstract 3532: Transforming genomic data into images for enhanced deep learning in precision oncology

Abstract 909: Enhancing genomic analysis in cancer diagnostics: A machine learning approach for removing artifacts in FFPE specimens

Abstract C022: Pathomics-based classifiers for inferring breast cancer genomic assays and prognosis in sub-Saharan Africa: Results from the Ghana Breast Health Study

Abstract 939: Comprehensive genomic and transcriptomic analysis to guide therapy for patients with metastatic solid tumors

Abstract 1774: Examination of variants of unknown significance (VUSs) and co-occurring mutations from comprehensive genomic profiling (CGP) results in a cross tumor model