A Deep Learning Based Method to Measure the Similarity of Long Text

Guohua Wang,Tianjian Zhang,Genpeng Xu,Yongsen Zheng,Zhiguo Du,Qi Long
DOI: https://doi.org/10.1109/ICISCAE51034.2020.9236879
2020-01-01
Abstract:For complex text data, especially for long text data, in order to measure the text similarity, the traditional methods are not accurate enough. We found that it is mainly because the feature representation ability is not strong enough. To improve the accuracy of long text similarity, an algorithm based on pre-training deep learning model is proposed to extract features of long text. On the benchmark data set of THUCNews corpus, the accuracy of our method is 5.4% higher than that of the traditional algorithm. Besides, we perform ablation experiments to test the improvement of fine-tuning technology.
What problem does this paper attempt to address?