Abstract:Background: Diagnosis codes are assigned to medical records in healthcare facilities by trained coders by reviewing all physician authored documents associated with a patient's visit. This is a necessary and complex task involving coders adhering to coding guidelines and coding all assignable codes. With the popularity of electronic medical records (EMRs), computational approaches to code assignment have been proposed in the recent years. However, most efforts have focused on single and often short clinical narratives, while realistic scenarios warrant full EMR level analysis for code assignment. Objective: We evaluate supervised learning approaches to automatically assign international classification of diseases (ninth revision) - clinical modification (ICD-9-CM) codes to EMRs by experimenting with a large realistic EMR dataset. The overall goal is to identify methods that offer superior performance in this task when considering such datasets. Methods: We use a dataset of 71,463 EMRs corresponding to in-patient visits with discharge date falling in a two year period (2011-2012) from the University of Kentucky (UKY) Medical Center. We curate a smaller subset of this dataset and also use a third gold standard dataset of radiology reports. We conduct experiments using different problem transformation approaches with feature and data selection components and employing suitable label calibration and ranking methods with novel features involving code co-occurrence frequencies and latent code associations. Results: Over all codes with at least 50 training examples we obtain a micro F-score of 0.48. On the set of codes that occur at least in 1% of the two year dataset, we achieve a micro F-score of 0.54. For the smaller radiology report dataset, the classifier chaining approach yields best results. For the smaller subset of the UKY dataset, feature selection, data selection, and label calibration offer best performance. Conclusions: We show that datasets at different scale (size of the EMRs, number of distinct codes) and with different characteristics warrant different learning approaches. For shorter narratives pertaining to a particular medical subdomain (e.g., radiology, pathology), classifier chaining is ideal given the codes are highly related with each other. For realistic in-patient full EMRs, feature and data selection methods offer high performance for smaller datasets. However, for large EMR datasets, we observe that the binary relevance approach with learning-to-rank based code reranking offers the best performance. Regardless of the training dataset size, for general EMRs, label calibration to select the optimal number of labels is an indispensable final step.

Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare

A Patient-Similarity-based Model for Diagnostic Prediction

Diagnosis Prevalence vs. Efficacy in Machine-learning Based Diagnostic Decision Support

An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records

Automated ICD coding for primary diagnosis via clinically interpretable machine learning

SCOPE: predicting future diagnoses in office visits using electronic health records

Gaussian Process Regression and Classification using International Classification of Disease Codes as Covariates

Incorporating Medical Code Descriptions for Diagnosis Prediction in Healthcare

Disease phenotyping using deep learning: A diabetes case study

Multimodal Machine Learning for Automated ICD Coding.

Supervised Extraction of Diagnosis Codes from EMRs: Role of Feature Selection, Data Selection, and Probabilistic Thresholding

A General Framework for Diagnosis Prediction Via Incorporating Medical Code Descriptions

Rare Codes Count: Mining Inter-code Relations for Long-tail Clinical Text Classification

Deep-ADCA: Development and Validation of Deep Learning Model for Automated Diagnosis Code Assignment Using Clinical Notes in Electronic Medical Records

A machine learning model for predicting congenital heart defects from administrative data

The accuracy vs. coverage trade-off in patient-facing diagnosis models

A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases

A retrospective analysis using comorbidity detecting algorithmic software to determine the incidence of International Classification of Diseases (ICD) code omissions and appropriateness of Diagnosis-Related Group (DRG) code modifiers

Characterizing diseases using genetic and clinical variables: A data analytics approach

Enhancing diagnostic accuracy in symptom-based health checkers: a comprehensive machine learning approach with clinical vignettes and benchmarking

Benchmarking Large Language Models for Extraction of International Classification of Diseases Codes from Clinical Documentation