Abstract:Background: Medical coding is the process that converts clinical documentation into standard medical codes. Codes are used for several key purposes in a hospital (eg, insurance reimbursement and performance analysis); therefore, their optimization is crucial. With the rapid growth of natural language processing technologies, several solutions based on artificial intelligence have been proposed to aid in medical coding by automatically suggesting relevant codes for clinical documents. However, their effectiveness is still limited to simple cases, and it is not yet clear how much value they can bring in improving coding efficiency and accuracy. Objective: This study aimed to bring more efficiency to the coding process to improve the selection of codes by medical coders. To achieve this, we developed an innovative multimodal machine learning–based solution that, instead of predicting codes, detects the degree of coding complexity before coding is performed. The notion of coding complexity was used to better dispatch work among medical coders to eventually minimize errors and improve throughput. Methods: To train and evaluate our approach, we collected 2060 cases rated by coders in terms of coding complexity from 1 (simplest) to 4 (most complex). We asked 2 expert coders to rate 3.01% (62/2060) of the cases as the gold standard. The agreements between experts were used as benchmarks for model evaluation. A case contains both clinical text and patient metadata from the hospital electronic health record. We extracted both text features and metadata features, then concatenated and fed them into several machine learning models. Finally, we selected 2 models. The first used cross-validated training on 1751 cases and testing on 309 cases aiming to assess the predictive power of the proposed approach and its generalizability. The second model was trained on 1998 cases and tested on the gold standard to validate the best model performance against human benchmarks. Results: Our first model achieved a macro– F 1 -score of 0.51 and an accuracy of 0.59 on classifying the 4-scale complexity. The model distinguished well between the simple (combined complexity 1-2) and complex (combined complexity 3-4) cases with a macro– F 1 -score of 0.65 and an accuracy of 0.71. Our second model achieved 61% agreement with experts' ratings and a macro– F 1 -score of 0.62 on the gold standard, whereas the 2 experts had a 66% (41/62) agreement ratio with a macro– F 1 -score of 0.67. Conclusions: We propose a multimodal machine learning approach that leverages information from both clinical text and patient metadata to predict the complexity of coding a case in the precoding phase. By integrating this model into the hospital coding system, distribution of cases among coders can be done automatically with performance comparable with that of human expert coders, thus improving coding efficiency and accuracy at scale.

Supervised Extraction of Diagnosis Codes from EMRs: Role of Feature Selection, Data Selection, and Probabilistic Thresholding

Insulator surface charge accumulation under impulse voltage

Automated feature selection of predictors in electronic medical records data

Medical Datasets Classification using a Hybrid Genetic Algorithm for Feature Selection based on Pearson Correlation Coefficient

Intelligent EHRs: Predicting Procedure Codes From Diagnosis Codes

A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases

Multi-label natural language processing to identify diagnosis and procedure codes from MIMIC-III inpatient notes

Curvature-based Feature Selection with Application in Classifying Electronic Health Records

Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare

Computational Intelligence Based Electronic Healthcare Data Analytics Using Feature Selection with Classification by Deep Learning Architecture

Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes

An End-to-End Natural Language Processing Application for Prediction of Medical Case Coding Complexity: Algorithm Development and Validation

Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

Automatic Infection Detection Based on Electronic Medical Records.

Ensemble neural models for ICD code prediction using unstructured and structured healthcare data

Rare Codes Count: Mining Inter-code Relations for Long-tail Clinical Text Classification

Deep-ADCA: Development and Validation of Deep Learning Model for Automated Diagnosis Code Assignment Using Clinical Notes in Electronic Medical Records

Unsupervised Feature Selection to Identify Important ICD-10 Codes for Machine Learning: A Case Study on a Coronary Artery Disease Patient Cohort

Optimizing Disease Prediction with Artificial Intelligence Driven Feature Selection and Attention Networks

Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network

Diagnosis Prevalence vs. Efficacy in Machine-learning Based Diagnostic Decision Support