Abstract:Introduction: Natural language processing has recently achieved unprecedented performances for several medical tasks, but requires additional improvements in oncology. Moreover, very few projects have assessed the potential of language models to help prevent the most frequent serious or severe medical events in medical oncology. We aim at predicting nausea or vomiting (ICD10 code R11), and fatigue or malaise (ICD10 code R53) from patients' medical reports. Material and methods: The study included all the patients of Centre Léon Bérard between 2000 and 2023 that have not refused to share their data for this analysis. We have retrieved all the clinical notes and manual coding of hospitalization stays in ICD10, in French. We have pretrained a BERT language model with a masking strategy on this data and then fine-tuned it and compared it to several medical pretrained open source models (DrBert and K-memBERT). The labels were medical events leading to or associated with a hospitalization in the 90 days after every patients' notes. For OncoBERT, we included sequential reports from the patients' history, along with a time-encoding layer, and integrated it in a final transformer layer. Results: We analyzed 140,523 patients, representing 2,515,957 pseudo-anonymized text reports and 6.6M hospitalizations codes in total. The medical texts were consultations reports (56%), end-of-stay reports (17%) and hospitalization summaries (9%). The most frequent types of oncology treatments received by the patients at each time point were 18.8% for chemotherapy, 10.8% for targeted therapies and 1.3% for immunotherapies. The most frequent medical events were nausea or vomiting (20% of patients with 1 or more events), and fatigue or malaise (18% of patients with 1 or more events). In the final dataset, nausea and vomiting (R11) accounted for 16% of the labels while malaise and fatigue (R53) accounted for 24.7% of the labels. We performed random undersampling of reports without any event to balance the label dataset. The fine-tuning on R11 and R53 achieved the performances of 0.58 macro-aucpr (OncoBERT) and 0.50 macro-aucpr (best open-source model) on the validation set. Conclusion: The language models achieved high performance on the prediction of the most frequent serious medical events in our hospital dedicated to cancer care. We plan to validate the external performances of these models on collaborating hospitals and prospectively and improve the interpretations that we will present at the congress. Citation Format: Raphael Vienne, Quentin Filori, Vincent Susplugas, Hugo Crochet, Loic Verlingue. Prediction of nausea or vomiting, and fatigue or malaise in cancer care [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 3475.

Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes

CancerBERT: a BERT model for Extracting Breast Cancer Phenotypes from Electronic Health Records

Optimizing classification of diseases through language model analysis of symptoms

Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives

A Cross-institutional Evaluation on Breast Cancer Phenotyping NLP Algorithms on Electronic Health Records

[Pneumothorax: an unusual complication following gastrointestinal endoscopy].

Revealing the impact of social circumstances on the selection of cancer therapy through natural language processing of social work notes

Extracting comprehensive clinical information for breast cancer using deep learning methods

An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping

A pre-trained language model for emergency department intervention prediction using routine physiological data and clinical narratives

Bronchodilator response to ipratropium bromide in infants with bronchopulmonary dysplasia.

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

Improving Cancer Hallmark Classification with BERT-based Deep Learning Approach

Hybrid Student-Teacher Large Language Model Refinement for Cancer Toxicity Symptom Extraction

Empirical evaluation of language modeling to ascertain cancer outcomes from clinical text reports

Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers

Abstract 3475: Prediction of nausea or vomiting, and fatigue or malaise in cancer care

Identifying Symptoms of Delirium from Clinical Narratives Using Natural Language Processing

Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing.

Using real-world electronic health record data to predict the development of 12 cancer-related symptoms in the context of multimorbidity

Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods