Chinese long text similarity calculation of semantic progressive fusion based on Bert

Xiao Li,Lanlan Hu
DOI: https://doi.org/10.3233/jcm-247245
2024-08-18
Journal of Computational Methods in Sciences and Engineering
Abstract:Text similarity is an important index to measure the similarity between two or more texts. It is widely used in many fields of natural language processing tasks. With the maturity of deep learning technology, a large number of neural network models have been used to calculate text similarity and have achieved good results in similarity calculation task of sentences or short texts. Among them, Bert model has become a research hotspot in this field due to its excellent performance. However, the application effect of existing similarity algorithms on long texts is not ideal, and they cannot truly extract richer semantic information hidden in the structure of long text documents. This paper takes Chinese long text as the research object, proposes a long text similarity calculation method using sentence sequence instead of word level sequence, constructs a long text semantic representation model with semantic progressive fusion, solves the practical problems faced by applications or natural language processing tasks related to long text semantics, in order to breaks through the bottleneck of long text similarity calculation.
What problem does this paper attempt to address?