Abstract:The Unified Medical Language System (UMLS) Metathesaurus construction process mainly relies on lexical algorithms and manual expert curation for integrating over 200 biomedical vocabularies. A lexical-based learning model (LexLM) was developed to predict synonymy among Metathesaurus terms and largely outperforms a rule-based approach (RBA) that approximates the current construction process. However, the LexLM has the potential for being improved further because it only uses lexical information from the source vocabularies, while the RBA also takes advantage of contextual information. We investigate the role of multiple types of contextual information available to the UMLS editors, namely source synonymy (SS), source semantic group (SG), and source hierarchical relations (HR), for the UMLS vocabulary alignment (UVA) problem. In this paper, we develop multiple variants of context-enriched learning models (ConLMs) by adding to the LexLM the types of contextual information listed above. We represent these context types in context-enriched knowledge graphs (ConKGs) with four variants ConSS, ConSG, ConHR, and ConAll. We train these ConKG embeddings using seven KG embedding techniques. We create the ConLMs by concatenating the ConKG embedding vectors with the word embedding vectors from the LexLM. We evaluate the performance of the ConLMs using the UVA generalization test datasets with hundreds of millions of pairs. Our extensive experiments show a significant performance improvement from the ConLMs over the LexLM, namely +5.0% in precision (93.75%), +0.69% in recall (93.23%), +2.88% in F1 (93.49%) for the best ConLM. Our experiments also show that the ConAll variant including the three context types takes more time, but does not always perform better than other variants with a single context type. Finally, our experiments show that the pairs of terms with high lexical similarity benefit most from adding contextual information, namely +6.56% in precision (94.97%), +2.13% in recall (93.23%), +4.35% in F1 (94.09%) for the best ConLM. The pairs with lower degrees of lexical similarity also show performance improvement with +0.85% in F1 (96%) for low similarity and +1.31% in F1 (96.34%) for no similarity. These results demonstrate the importance of using contextual information in the UVA problem.

An Empirical Study of UMLS Concept Extraction from Clinical Notes using Boolean Combination Ensembles

Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis

Exploiting Collaborative Learning for Concept Extraction in the Medical Field.

Using concept-based indexing to improve language modeling approach to genomic IR

Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus

Extracting UMLS Concepts from Medical Text Using General and Domain-Specific Deep Learning Models

Comparison of MetaMap and cTAKES for entity extraction in clinical notes

On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models

Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion

Document-level Clinical Entity and Relation Extraction via Knowledge Base-Guided Generation

One LLM is not Enough: Harnessing the Power of Ensemble Learning for Medical Question Answering

Evaluating Biomedical BERT Models for Vocabulary Alignment at Scale in the UMLS Metathesaurus

Combining word embeddings to extract chemical and drug entities in biomedical literature

Extracting clinical concepts from user queries

Integrating UMLS Knowledge into Large Language Models for Medical Question Answering

Exploring the In-context Learning Ability of Large Language Model for Biomedical Concept Linking

Prospective Study for Semantic Inter-Media Fusion in Content-Based Medical Image Retrieval

UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition

A Natural Language Processing Approach to Support Biomedical Data Harmonization: Leveraging Large Language Models

INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning