Unsupervised Sentence Embedding Model Based on Contrastive Learning

Jianhou Gan,Jun Wang,Mingjie Wang,Zijie Li
DOI: https://doi.org/10.1109/ICCCS57501.2023.10151113
2023-04-21
Abstract:The unsupervised sentence embedding model of the contrastive learning framework SimCSE uses dropout noise as a data expansion method, which often defaults to having sentences of the same length to have more similar semantic information, and the random nature of dropout may lead to loss of semantic information or large differences due to sentence embedding. For this reason, we propose two agent tasks random deletion as well as R-Dropout to solve these problems. We conducted experiments on the text semantic similarity task on the publicly available datasets STS12-16, STS B, and SICK-R. The experimental results show that our proposed sentence embedding model improves the average Spearman correlation coefficient to 77.67 %, compared with the benchmark models IS-BERTbase, CT-BERTbase, and SimCSE- We also used the SenEval toolkit to evaluate the quality of sentence embed dings generated by the model, and used sentence embeddings as features of migration tasks MR, SUBJ, MPQA, TREC, and MRPC for classification tasks using SenEval, and the experimental results showed that our proposed sentence embedding model achieves better performance in the accuracy of classification in all cases.
Computer Science
What problem does this paper attempt to address?