Abstract:The text similarity calculation plays a crucial role as the core work of artificial intelligence commercial applications such as traditional Chinese medicine (TCM) auxiliary diagnosis, intelligent question and answer, and prescription recommendation. However, TCM texts have problems such as short sentence expression, inaccurate word segmentation, strong semantic relevance, high feature dimension, and sparseness. This study comprehensively considers the temporal information of sentence context and proposes a TCM text similarity calculation model based on the bidirectional temporal Siamese network (BTSN). We used the enhanced representation through knowledge integration (ERNIE) pretrained language model to train character vectors instead of word vectors and solved the problem of inaccurate word segmentation in TCM. In the Siamese network, the traditional fully connected neural network was replaced by a deep bidirectional long short-term memory (BLSTM) to capture the contextual semantics of the current word information. The improved similarity BLSTM was used to map the sentence that is to be tested into two sets of low-dimensional numerical vectors. Then, we performed similarity calculation training. Experiments on the two datasets of financial and TCM show that the performance of the BTSN model in this study was better than that of other similarity calculation models. When the number of layers of the BLSTM reached 6 layers, the accuracy of the model was the highest. This verifies that the text similarity calculation model proposed in this study has high engineering value.

A Korean Sentence Similarity Calculation Method Based on Sub-Word Level Information

Semantic Sequence Kin: A Method of Document Copy Detection

MNet-Sim: A Multi-layered Semantic Similarity Network to Evaluate Sentence Similarity

Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach

Sentence Similarity Computation in Question Answering Robot

Improving Sentence Similarity Estimation for Unsupervised Extractive Summarization

Chinese Word Similarity Computing Based on Combination Strategy

Research on Chinese Semantic Similarity Algorithm

Chinese Sentence Similarity Measure Based on Word Sequence Length and Word Weight

Traditional Chinese Medicine Text Similarity Calculation Model Based on the Bidirectional Temporal Siamese Network

Improving text similarity measurement by critical sentence vector model

Sentence Similarity Computation by Integrating Shallow and Deep Information

Semantic Similarity Computing Model Based on Multi Model Fine-Grained Nonlinear Fusion

Korean-Centered Cross-Lingual Parallel Sentence Corpus Construction Experiment

Ercnn: Enhanced Recurrent Convolutional Neural Networks For Learning Sentence Similarity

Sentence Similarity Computation Based on Feature Set

Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC

Sentence Similarity Based on Contexts

Chinese Sentences Similarity via Cross-Attention Based Siamese Network

Distance Based Korean WordNet(alias. KorLex) Embedding Model

Sentence-Embedding and Similarity via Hybrid Bidirectional-LSTM and CNN Utilizing Weighted-Pooling Attention