Abstract:Background Breast cancer is a complex disease that affects millions of people and is the leading cause of cancer death worldwide. There is therefore still a need to develop new tools to improve treatment outcomes for breast cancer patients. Electronic Health Records (EHRs) contain a wealth of information about patients, from pathological reports to biological measurements, that could be useful towards this end but remain mostly unexploited. Recent methodological developments in deep learning, however, open the way to developing new methods to leverage this information to improve patient care. Methods In this study, we propose M-BEHRT, a Multimodal BERT for Electronic Health Record (EHR) data based on BEHRT, itself an architecture based on the popular natural language architecture BERT (Bidirectional Encoder Representations from Transformers). M-BEHRT models multimodal patient trajectories as a sequence of medical visits, which comprise a variety of information ranging from clinical features, results from biological lab tests, medical department and procedure, and the content of free-text medical reports. M-BEHRT uses a pretraining task analog to a masked language model to learn a representation of patient trajectories from data that includes data that is unlabeled due to censoring, and is then fine-tuned to the classification task at hand. Finally, we used a gradient-based attribution method to highlight which parts of the input patient trajectory were most relevant for the prediction. Results We apply M-BEHRT to a retrospective cohort of about 15,000 breast cancer patients from Institut Curie (Paris, France) treated with adjuvant chemotherapy, using patient trajectories for up to one year after surgery to predict disease-free survival (DFS). M-BEHRT achieves an AUC-ROC of 0.77 [0.70-0.84] on a held-out data set for the prediction of DFS 3 years after surgery, compared to 0.67 [0.58-0.75] for the Nottingham Prognostic Index (NPI) and for a random forest (p-values = 0.031 and 0.050 respectively). In addition, we identified subsets of patients for which M-BEHRT performs particularly well such as older patients with at least one lymph node affected. Conclusion In conclusion, we proposed a novel deep learning algorithm to learn from multimodal EHR data. Learning from about 15,000 patient records, our model achieves state-of-the-art performance on two classification tasks. The EHR data used to perform these tasks was more homogeneous compared to other datasets used for pretraining, as it exclusively comprised adjuvant treated breast cancer patients. This highlights both the potential of EHR data for improving our understanding of breast cancer and the ability of transformer-based architectures to learn from EHR data containing much fewer than the millions of records typically used in currently published studies. The representation of patient trajectories used by M-BEHRT captures their sequential aspect, and opens new research avenues for understanding complex diseases and improving patient care.

BEHRT: Transformer for Electronic Health Records

Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records

DeepHealth: Deep Representation Learning with Autoencoders for Healthcare Prediction

ExBEHRT: Extended Transformer for Electronic Health Records to Predict Disease Subtypes & Progressions

Multimodal BEHRT: Transformers for Multimodal Electronic Health Records to predict breast cancer prognosis

TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records

Bidirectional Representation Learning From Transformers Using Multimodal Electronic Health Record Data to Predict Depression

Scalable and accurate deep learning with electronic health records

Deep EHR: Chronic Disease Prediction Using Medical Notes

Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

Electronic Health Records-Based Data-Driven Diabetes Knowledge Unveiling and Risk Prognosis

An explainable Transformer-based deep learning model for the prediction of incident heart failure

Time-aware Heterogeneous Graph Transformer with Adaptive Attention Merging for Health Event Prediction

Predicting Physiological Response in Heart Failure Management: A Graph Representation Learning Approach using Electronic Health Records

Transformers for cardiac patient mortality risk prediction from heterogeneous electronic health records

RAPT: Pre-training of Time-Aware Transformer for Learning Robust Healthcare Representation

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Transforming Healthcare with Deep Learning Cardiovascular Disease Prediction

CORE-BEHRT: A Carefully Optimized and Rigorously Evaluated BEHRT

Graph Transformers on EHRs: Better Representation Improves Downstream Performance

A Deep Learning Pipeline for Patient Diagnosis Prediction Using Electronic Health Records