Abstract:Background: A considerable amount of meaningful information is routinely recorded in Chinese clinical data in text format, referred to as Chinese clinical terms. The lack of coding is a major difficulty hindering the application of clinical terms. SNOMED CT is a widely used and comprehensive clinical health care terminology collection because of its coverage, granularity, clinical orientation, and logical underpinning. It is useful and efficient for automatically assigning SNOMED CT codes to Chinese clinical terms, but it still faces several problems. Current cross-language clinical term matching studies rely on external resources, such as machine translation and rule-based methods. Semantic matching methods have achieved strong performance on text matching, but few studies have been done on cross-language clinical term matching. We present an effective attention-based semantic matching algorithm to automatically cross-language code Chinese clinical terms with SNOMED CT. Method: Firstly, BERT was used to turn the input into word embedding. Then, the word embeddings were encoded through a BiLSTM with self-attention to focus on capturing distant relationships among words with different weights depending on their contribution to semantic matching. Then, decomposable attention was used to make semantic matching trivially parallelizable to speed up calculation. Finally, fully connected layers and a sigmoid were utilized to output matching results. Results: The 29,960 manually coded Chinese clinical terms, 30,040 unmatched Chinese clinical terms and SNOMED CT codes were collected to evaluate the proposed method. Compared with the existing semantic matching method, the proposed approach achieves state-of-the-art results demonstrating the effectiveness of the method with an accuracy of 0.905, a precision of 0.856, a recall of 0.518, and an F-measure of 0.645. The proposed Chinese-English bilingual term mapping, Chinese character-level and word-level encoder, English word-level encoder, BERT model, and attention mechanism performed better than other methods. Conclusion: The proposed automatic SNOMED CT coding approach of Chinese clinical terms via attention-based semantic matching can improve the performance of automated SNOMED CT code assignment for Chinese clinical terms and improve the efficiency of the code assignment.

Evaluating Semantic Similarity Between Chinese Biomedical Terms Through Multiple Ontologies with Score Normalization: an Initial Study.

Automatic SNOMED CT coding of Chinese clinical terms via attention-based semantic matching

From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity

Semantic Similarity Measures to Disambiguate Terms in Medical Text.

Semantic Web for data harmonization in Chinese medicine

Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models

Biomedical named entity normalization via interaction-based synonym marginalization

A Hierarchical Method to Automatically Encode Chinese Diagnoses Through Semantic Similarity Estimation

Corpus domain effects on distributional semantic modeling of medical terms

Ontology-Oriented Diagnostic System for Traditional Chinese Medicine Based on Relation Refinement

CMCN: Chinese medical concept normalization using continual learning and knowledge-enhanced

Natural Language Processing Algorithms for Normalizing Expressions of Synonymous Symptoms in Traditional Chinese Medicine

An Ensemble Semantic Textual Similarity Measure Based on Multiple Evidences for Biomedical Documents

Enriching Consumer Health Vocabulary Through Mining a Social Q&A Site: A Similarity-Based Approach

Using NLP in openEHR archetypes retrieval to promote interoperability: a feasibility study in China

Comprehensive evaluation integrating omics strategy and machine learning algorithms for consistency of calculus bovis from different sources

Normalization of Chinese Informal Medical Terms Based on Multi-field Indexing

A SNOMED Supported Ontological Vector Model for Subclinical Disorder Detection Using EHR Similarity.

Exploring Semantic Information in Disease: Simple Data Augmentation Techniques for Chinese Disease Normalization.

Traditional chinese medicine synonymous term conversion: A bidirectional encoder representations from transformers-based model for converting synonymous terms in traditional chinese medicine

Medical Document Clustering Using Ontology-Based Term Similarity Measures