Disambiguation of Medical Abbreviations for Knowledge Organization.
Yueyan Li,Hao Wang,Xiaomin Li,Sanhong Deng,Tong Su,Wei Zhang
DOI: https://doi.org/10.1016/j.ipm.2023.103441
IF: 7.466
2023-01-01
Information Processing & Management
Abstract:The disambiguation of abbreviations is a crucial step in medical knowledge organization. In the past, most scholars have focused on the problem of disambiguating medical abbreviations in single sentences; they have not systematically considered full-article abbreviation disambiguation tasks. In this work, we present a research framework for full-article medical abbreviation disambiguation (FMADRF) based on the structural characteristics of abbreviation-definition pairs in a full scientific medical article. Our method utilizes the information including context semantic information, external linguistic features, and the mapping relationships and structural similarities between abbreviations and their expansions. The model includes a four-pronged approach, identification of abbreviations and abbreviation-definition pairs, alignment and complementation of abbreviations and abbreviation expansions. The results show that our novel BBF-BLC-R model improves the recognition and modification effects of abbreviation-definition pairs, achieving the best F1 score of 91.83%. Furthermore, our new strategy combines semantic and structural information to significantly improve the effects of term alignment, with an F1 score of 97.11%. In our test, a thesaurus of abbreviations and their expansions was constructed from 13,472 full-text medical articles, resulting in 14,742 abbreviations, with 31,327 corresponding expansions. This work enhances the semantic association of terms in full medical texts, eliminating the problems of "rich" semantics and association-relation roadblocks caused by term misalignments. It further provides technical and methodological support for the organization of medical knowledge, facilitating the deep knowledge-mining capabilities of full-text medical articles.