Abstract:The past decade has seen an explosion of the amount of digital information generated within the healthcare domain. Digital data exist in the form of images, video, speech, transcripts, electronic health records, clinical records, and free-text. Analysis and interpretation of healthcare data is a daunting task, and it demands a great deal of time, resources, and human effort. In this paper, we focus on the problem of co-morbidity recognition from patient's clinical records. To this aim, we employ both classical machine learning and deep learning approaches. We use word embeddings and bag-of-words representations, coupled with feature selection techniques. The goal of our work is to develop a classification system to identify whether a certain health condition occurs for a patient by studying his/her past clinical records. In more detail, we have used pre-trained word2vec, domain-trained, GloVe, fastText, and universal sentence encoder embeddings to tackle the classification of sixteen morbidity conditions within clinical records. We have compared the outcomes of classical machine learning and deep learning approaches with the employed feature representation methods and feature selection methods. We present a comprehensive discussion of the performances and behaviour of the employed classical machine learning and deep learning approaches. Finally, we have also used ensemble learning techniques over a large number of combinations of classifiers to improve the single model performance. For our experiments, we used the n2c2 natural language processing research dataset, released by Harvard Medical School. The dataset is in the form of clinical notes that contain patient discharge summaries. Given the unbalancedness of the data and their small size, the experimental results indicate the advantage of the ensemble learning technique with respect to single classifier models. In particular, the ensemble learning technique has slightly improved the performan-es of single classification models but has greatly reduced the variance of predictions stabilizing the accuracies (i.e., the lower standard deviation in comparison with single classifiers). In real-life scenarios, our work can be employed to identify with high accuracy morbidity conditions of patients by feeding our tool with their current clinical notes. Moreover, other domains where classification is a common problem might benefit from our approach as well.

Neural translation and automated recognition of ICD10 medical entities from natural language

Automatic ICD-10 Code Association: A Challenging Task on French Clinical Texts

Med7: a transferable clinical natural language processing model for electronic health records

An Encoder-Decoder Model for ICD-10 Coding of Death Certificates

Neural machine translation of clinical procedure codes for medical diagnosis and uncertainty quantification

Using natural language processing for automated classification of disease and to identify misclassified ICD codes in cardiac disease

Detecting of a Patient's Condition From Clinical Narratives Using Natural Language Representation

Deep-ADCA: Development and Validation of Deep Learning Model for Automated Diagnosis Code Assignment Using Clinical Notes in Electronic Medical Records

Automated ICD Coding Based on Neural Machine Translation

Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks

Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning

Public Health Informatics: Proposing Causal Sequence of Death Using Neural Machine Translation

Neura: a specialized large language model solution in neurology

Multilabel classification of medical concepts for patient clinical profile identification

GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models

A Neural Architecture For Automated Icd Coding

Ensembling Classical Machine Learning and Deep Learning Approaches for Morbidity Identification From Clinical Notes

Towards Automated ICD Coding Using Deep Learning

Natural language processing of MIMIC-III clinical notes for identifying diagnosis and procedures with neural networks

Modelling long medical documents and code associations for explainable automatic ICD coding

DiLBERT: Cheap Embeddings for Disease Related Medical NLP