Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study

Sheng-Feng Sung,Ya-Han Hu,Chong-Yan Chen
DOI: https://doi.org/10.2196/56955
IF: 3.2
2024-10-02
JMIR Medical Informatics
Abstract:Background: Electronic Medical Records (EMRs) store extensive patient data and serve as a comprehensive repository, including textual medical records like surgical and imaging reports. Their utility in Clinical Decision Support Systems (CDSS) is significant, but the widespread use of ambiguous and unstandardized abbreviations in clinical documents poses challenges for Natural Language Processing (NLP) in CDSS. Efficient abbreviation disambiguation methods are needed for effective information extraction. Objective: This study aims to enhance the One-to-All (OTA) framework for clinical abbreviation expansion, which uses a single model to predict multiple meanings of abbreviations. The objective is to improve OTA by developing context-candidate pairs and optimizing word embeddings in BERT, evaluating the model's efficacy in expanding clinical abbreviations using real data. Methods: Three datasets were used: MSH WSD, UMN, and CYCH from Ditmanson Medical Foundation Chia-Yi Christian Hospital. Texts containing polysemous abbreviations were preprocessed and formatted for BERT. The study involved fine-tuning pre-trained models, ClinicalBERT and BlueBERT, generating dataset pairs for training and testing based on Huang et al.'s (2019) method. Results: BlueBERT achieved macro and micro accuracies of 95.41% and 95.16% on the MSH WSD dataset, respectively. It improved macro accuracy by 0.54-1.53% compared to two baselines, LSTM and deepBioWSD with random embedding. On the UMN dataset, BlueBERT recorded macro and micro accuracies of 98.40% and 98.22%, respectively. Against the baselines of Word2Vec+SVM and BioWordVec+SVM, BlueBERT demonstrated a macro accuracy improvement of 2.61-4.13%. Conclusions: This research preliminarily validates the effectiveness of the OTA method for abbreviation disambiguation in medical texts, demonstrating potential to enhance both clinical staff efficiency and research effectiveness.
medical informatics
What problem does this paper attempt to address?