Multimodal Learning for Cardiovascular Risk Prediction using EHR Data

Ayoub Bagheri,T. Katrien J. Groenhof,Wouter B. Veldhuis,Pim A. de Jong,Folkert W. Asselbergs,Daniel L. Oberski
DOI: https://doi.org/10.1145/3388440.3414924
2020-09-21
Abstract:Electronic health records (EHRs) contain structured and unstructured data of significant clinical and research value. Various machine learning approaches have been developed to employ information in EHRs for risk prediction. The majority of these attempts, however, focus on structured EHR fields and lose the vast amount of information in the unstructured texts. Deep neural networks, on the other hand, gained tremendous momentum in knowledge discovery from EHR texts, while there are very seldom studies that used of both free-texts and the structured information in EHRs for clinical prediction. To exploit the potential information captured in EHRs, in this study we propose MI-BiLSTM, a multimodal bidirectional long short-term memory-based framework for cardiovascular risk prediction that integrates medical texts and structured clinical information. The MI-BiLSTM framework concatenates word embeddings from x-ray reports to classical clinical predictors from the Second Manifestations of ARTerial disease (SMART) study [1], before applying them to a final fully connected neural network. In the experiments, by employing the proposed framework, we compared performances of different deep neural network architectures on data of 5603 patients using 5-fold cross validation. Evaluated on the SMART study, we demonstrate the clinical relevance of integrating text features and classical predictors for cardiovascular risk prediction for patients with manifest vascular disease or at high--risk for cardiovascular disease. Our results show that the MI-BiLSTM framework using text data in addition to laboratory values outperforms deep learning models using only known clinical predictors. In future, we will focus on expanding our multimodal framework to import knowledge from available medical ontologies to enhance the quality of clinical decision making in risk prediction models. An open-source implementation of the proposed framework is publicly available at https://github.com/bagheria/CardioRisk-TextMining
What problem does this paper attempt to address?