Sentence Similarity Computation in Question Answering Robot
Shijing Si,Weiguo Zheng,Liuyang Zhou,Mei Zhang
DOI: https://doi.org/10.1088/1742-6596/1237/2/022093
2019-01-01
Journal of Physics Conference Series
Abstract:Computing semantic similarity between sentences or texts is vital in many natural language processing (NLP) tasks such as search, query suggestion, and question answering (QA). Many methods have been developed, based on lexical matching, distributional semantics, etc. However, lexical features, like string matching, fail to capture semantic similarity. In this research, our focus lies on the implementation of distributional representations and how to tune parameters when obtaining representations of words with commonly used word embedding techniques, e.g., Word2Vec and GloVe. We conduct experiments in the setting of Chinese semantic sentence matching tasks on the finance-domain. We examine the goodness of word embedding by both the cosine similarity of semantically similar sentence pairs and semantically dissimilar pairs. Based on our experiments, Word2Vec performs better than GloVe in the sense that Chinese character embedding from Word2Vec yield larger disparity of cosine distances between similar sentence pairs and dissimilar pairs. Also we report the optimal parameters for Word2Vec continuous bag-of-word (CBOW) through our trials, with window size being 6 and embedding dimension being 400, which can be good initial values for other projects.