Abstract:Background: The tenth revision of the International Classification of Diseases (ICD-10) is widely used for epidemiological research and health management. The clinical modification (CM) and procedure coding system (PCS) of ICD-10 were developed to describe more clinical details with increasing diagnosis and procedure codes and applied in disease-related groups for reimbursement. The expansion of codes made the coding time-consuming and less accurate. The state-of-the-art model using deep contextual word embeddings was used for automatic multilabel text classification of ICD-10. In addition to input discharge diagnoses (DD), the performance can be improved by appropriate preprocessing methods for the text from other document types, such as medical history, comorbidity and complication, surgical method, and special examination. Objective: This study aims to establish a contextual language model with rule-based preprocessing methods to develop the model for ICD-10 multilabel classification. Methods: We retrieved electronic health records from a medical center. We first compared different word embedding methods. Second, we compared the preprocessing methods using the best-performing embeddings. We compared biomedical bidirectional encoder representations from transformers (BioBERT), clinical generalized autoregressive pretraining for language understanding (Clinical XLNet), label tree-based attention-aware deep model for high-performance extreme multilabel text classification (AttentionXLM), and word-to-vector (Word2Vec) to predict ICD-10-CM. To compare different preprocessing methods for ICD-10-CM, we included DD, medical history, and comorbidity and complication as inputs. We compared the performance of ICD-10-CM prediction using different preprocesses, including definition training, external cause code removal, number conversion, and combination code filtering. For the ICD-10 PCS, the model was trained using different combinations of DD, surgical method, and key words of special examination. The micro F 1 score and the micro area under the receiver operating characteristic curve were used to compare the model's performance with that of different preprocessing methods. Results: BioBERT had an F 1 score of 0.701 and outperformed other models such as Clinical XLNet, AttentionXLM, and Word2Vec. For the ICD-10-CM, the model had an F 1 score that significantly increased from 0.749 (95% CI 0.744-0.753) to 0.769 (95% CI 0.764-0.773) with the ICD-10 definition training, external cause code removal, number conversion, and combination code filter. For the ICD-10-PCS, the model had an F 1 score that significantly increased from 0.670 (95% CI 0.663-0.678) to 0.726 (95% CI 0.719-0.732) with a combination of discharge diagnoses, surgical methods, and key words of special examination. With our preprocessing methods, the model had the highest area under the receiver operating characteristic curve of 0.853 (95% CI 0.849-0.855) and 0.831 (95% CI 0.827-0.834) for ICD-10-CM and ICD-10-PCS, respectively. Conclusions: The performance of our model with the pretrained contextualized language model and rule-based preprocessing method is better than that of the state-of-the-art model for ICD-10-CM or ICD-10-PCS. This study highlights the importance of rule-based preprocessing methods based on coder coding rules.

What Kind of Transformer Models to Use for the ICD-10 Codes Classification Task

A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data

TransICD: Transformer Based Code-wise Attention Model for Explainable ICD Coding

Surface coil cardiac tagging and 31P spectroscopic localization with B1‐insensitive adiabatic pulses

Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches

Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models

Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings

Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study

Transformers for Multi-label Classification of Medical Text: An Empirical Comparison

Towards BERT-based Automatic ICD Coding: Limitations and Opportunities

Combining transformer-based model and GCN to predict ICD codes from clinical records

Automatic ICD-10 Code Association: A Challenging Task on French Clinical Texts

Predicting Multiple ICD-10 Codes from Brazilian-Portuguese Clinical Notes

Modelling Temporal Document Sequences for Clinical ICD Coding

Transformers and large language models in healthcare: A review

Deep-ADCA: Development and Validation of Deep Learning Model for Automated Diagnosis Code Assignment Using Clinical Notes in Electronic Medical Records

Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning

Ensemble neural models for ICD code prediction using unstructured and structured healthcare data

Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset

Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks

Limitations of Transformers on Clinical Text Classification