Abstract:Abstract The National Cancer Institute’s (NCI) Surveillance, Epidemiology, and End Results (SEER) registries maintain and organize cancer incidence information allowing researchers to derive valuable insights into cancer epidemiology. While significant attention has been devoted to identifying cancers either from clinical text or through tabular data collected by SEER registries, there has been less emphasis on integrating these distinct modes of data. In our multimodal deep learning approach, we use longitudinal tabular data from the Consolidated Tumor Case (CTC) database that encompass a patient’s past diagnoses. This tabular information can augment clinical text to aid in the classification of pathology reports indicative of recurrent cancers. Four NCI SEER registries (Louisiana, New Jersey, Seattle and Utah) have manually labeled 61,150 pathology reports with one of six categories, which we refine into a four-class classification problem. Each pathology report is identified as either positive for recurrence, negative for recurrence/not disease free, new tumor, or an “other” (no malignancy/uncertain) class. Natural Language Processing techniques can extract meaningful information from clinical pathology reports, aiding in the identification of subtle indicators of recurrence by using relevant context. We use a hierarchical self-attention model (HiSAN) to construct document embeddings and classify the pathology report. To further enhance the predictive accuracy of our modeling approach we fuse the textual information from a pathology report with categorical data about patient’s cancer history. For each report, we create a patient context vector that encapsulates tumor-level information from patient’s previous cancer(s). The selected CTC records are associated with cancers diagnosed more than 120 days before the date of biospecimen collection stated in the pathology report. The patient context vector is crafted based on diverse categorical features; including cancer staging, patient age, treatment and sites of metastasis at the time of diagnosis. Features are represented using a combination of one-hot encoding and binning. Additionally, we employ patient and feature-level normalization to maintain proportional significance of features for individuals with multiple past diagnoses. We present preliminary results corresponding to different approaches for classifying cancer recurrence; first, we observe that using only the pathology reports as input yields an accuracy of 68%. Secondly, when using only CTC features with an XGBoost model, we achieve an accuracy of 49%. Finally we show that leveraging multiple data modalities, i.e. HiSAN generated pathology report embeddings and CTC data, significantly improves the model’s predictive accuracy to 76%. This research demonstrates a promising path forward in enhancing classification of clinical text by incorporating longitudinal patient history data. Citation Format: Patrycja Krawczuk, Zachary Fox, Dakota Murdock, Jennifer Doherty, Antoinette Stroupe, Stephen M. Schwartz, Lynne Penberthy, Elizabeth Hsu, Serban Negoita, Valentina Petkov, Heidi Hanson. Multimodal machine learning for the automatic classification of recurrent cancers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2318.

Large-Scale Deep Learning for Metastasis Detection in Pathology Reports

Deep Learning Provides Rapid Screen for Breast Cancer Metastasis with Sentinel Lymph Nodes

Deep Learning for Identifying Metastatic Breast Cancer

Detecting Cancer Metastases on Gigapixel Pathology Images

Deep learning system for lymph node quantification and metastatic cancer identification from whole-slide pathology images

Comparative Study of Deep Learning Models for Automatic Detection of Metastases in H&E Stained Images

A Study of Deep Learning Colon Cancer Detection in Limited Data Access Scenarios

Abstract 896: Predicting metastatic transcriptomes of patient tumors with deep learning

Artificial Intelligence-Based Sentinel Lymph Node Metastasis Detection in Cervical Cancer

Generalization of Deep Learning in Digital Pathology: Experience in Breast Cancer Metastasis Detection

Machine Learning in Metastatic Cancer Research: Potentials, Possibilities, and Prospects

Parameter-Efficient Methods for Metastases Detection from Clinical Notes

Weakly supervised annotation-free cancer detection and prediction of genotype in routine histopathology

Development of a Deep Learning System for Intra-Operative Identification of Cancer Metastases

Identifying Metastases in Sentinel Lymph Nodes with Deep Convolutional Neural Networks

Predicting Metastasis Risk in Pancreatic Neuroendocrine Tumors Using Deep Learning Image Analysis

Deep Learning for Predicting Metastasis on Melanoma WSIs

Abstract 2318: Multimodal machine learning for the automatic classification of recurrent cancers

Deep learning-based survival prediction for multiple cancer types using histopathology images

AI‐guided histopathology predicts brain metastasis in lung cancer patients