Abstract:Computing semantic similarity between sentences or texts is vital in many natural language processing (NLP) tasks such as search, query suggestion, and question answering (QA). Many methods have been developed, based on lexical matching, distributional semantics, etc. However, lexical features, like string matching, fail to capture semantic similarity. In this research, our focus lies on the implementation of distributional representations and how to tune parameters when obtaining representations of words with commonly used word embedding techniques, e.g., Word2Vec and GloVe. We conduct experiments in the setting of Chinese semantic sentence matching tasks on the finance-domain. We examine the goodness of word embedding by both the cosine similarity of semantically similar sentence pairs and semantically dissimilar pairs. Based on our experiments, Word2Vec performs better than GloVe in the sense that Chinese character embedding from Word2Vec yield larger disparity of cosine distances between similar sentence pairs and dissimilar pairs. Also we report the optimal parameters for Word2Vec continuous bag-of-word (CBOW) through our trials, with window size being 6 and embedding dimension being 400, which can be good initial values for other projects.

An Efficient Similarity Measure Algorithm of Chinese Sentence

Chinese Sentence Based Lexical Similarity Measure for Artificial Intelligence Chatbot

An Improved Method of Computing Chinese Sentence Similarity

Chinese Sentence Similarity Measure Based on Word Sequence Length and Word Weight

A New Method For Chinese Sentence Similarity Computing And Its Weighting Coefficients Determination

A Chinese Short Text Similarity Algorithm Based on Semantic and Syntax

A New Method for Calculating Similarity Between Sentences and Application on Automatic Abstracting

Chinese Sentence Similarity Measure Based on Words and Structure Information

Chinese Sentence Similarity Computing Based on Semantic Dependency Relationship Analysis

A Model for Chinese Sentence Similarity Computing

A Chinese Short Text Similarity Method Integrating Sentence-Level and Phrase-Level Semantics

CHINESE SHORT SENTENCE SIMILARITY CALCULATION BASED ON TREE-STRUCTURE CORPUS

Chinese Sentence Similarity Based on Multi-feature Combination

Calculating Statistical Similarity Between Sentences

Sentence Similarity Computation in Question Answering Robot

ON AUTOMATIC ABSTRACTING ALGORITHM BASED ON OPTIMISED SENTENCES SIMILARITY CALCULATION

A New Approach to Compute the Semantic Similarity of Chinese Question Sentence

Research on Chinese Sentence Similarity Computation

CHINESE SENTENCE SIMILARITY COMPUTING BASED ON IMPROVED EDIT-DISTANCE AND DEPENDENCY GRAMMAR

Chinese Word Similarity Computing Based on Combination Strategy

Researches of Chinese Sentence Similarity Based on HowNet