SiBERT: A Siamese-based BERT network for Chinese medical entities alignment

Zerui Ma,Linna Zhao,Jianqiang Li,Xi Xu,Jing Li
DOI: https://doi.org/10.1016/j.ymeth.2022.07.003
IF: 4.647
2022-09-01
Methods
Abstract:Entity alignment aims at associating semantically similar entities in knowledge graphs from different sources. It is widely used in the integration and construction of professional medical knowledge. The existing deep learning methods lack term-level embedding representation, which limits the performance of entity alignment and causes a massive computational overhead. To address these problems, we propose a Siamese-based BERT (SiBERT) for Chinese medical entities alignment. SiBERT generates term-level embedding based on word embedding sequences to enhance the features of entities in similarity calculation. The process of entity alignment contains three steps. Specifically, the SiBERT is firstly pre-trained with synonym dictionary in the public domain, and transferred to the task of medical entity alignment. Secondly, four different categories of entities (disease, symptom, treatment, and examination) are labeled based on the standard terms selected from standard terms dataset. The entities and their standard terms form term pairs to train SiBERT. Finally, combined with the entity alignment algorithm, the most similar standard term is selected as the final result. To evaluate the effectiveness of our method, we conduct extensive experiments on real-world datasets. The experimental results illustrate that SiBERT network is superior to other compared algorithms both in alignment accuracy and computational efficiency.
biochemistry & molecular biology,biochemical research methods
What problem does this paper attempt to address?