Text Semantic Matching with an Enhanced Sample Building Method Based on Contrastive Learning

Lishan Wu,Jie Hu,Fei Teng,Tianrui Li,Shengdong Du
DOI: https://doi.org/10.1007/s13042-023-01823-8
2023-01-01
International Journal of Machine Learning and Cybernetics
Abstract:Text semantic matching aims to determine whether two pieces of text point to the same semantic, which has been widely applied in clinical terminology standardization, recommendation systems, and other scenarios. Recently, many existing methods introduce the idea of contrast learning, to construct positive sample pairs and negative sample pairs for text semantic matching tasks. These methods first construct positive samples by using data augmentation and then use other samples within the same group as negative samples. However, the existing mainstream data enhancement methods like dropout ignore the impact of sentence length structure, and the implementation of the word repetition method is relatively complex. On the other hand, a sufficient number of negative samples is also crucial to the quality of model training. In this paper, we propose an enhanced sample building method (ESNCSE) to construct positive samples and negative samples for text semantic matching tasks. To generate positive sample pairs, we randomly insert some punctuation marks into the original text, which aims to add noise simply and efficiently. For the expansion of the number of negative samples without increasing calculation cost, we utilize the momentum contrast based on the sentence embedding method with soft negative sample (SNCSE). The experiment results on text semantic similarity task show that the average Spearman correlation coefficient is 79.74% for BERT-base and 80.64% for BERT-large.
What problem does this paper attempt to address?