DRWS: A Model for Learning Distributed Representations for Words and Sentences.

Chunwei Yan,Fan Zhang,Lian'en Huang
DOI: https://doi.org/10.1007/978-3-319-13560-1_16
2014-01-01
Abstract:Vector-space distributed representations of words can capture syntactic and semantic regularities in language and help learning algorithms to achieve better performance in natural language processing tasks by grouping similar words. With progress of machine learning techniques in recent years, much attention has been paid on this field. However, many NLP tasks such as text summary and sentence matching treat sentences as atomic units. In this paper, we introduce a new model called DRWS which can learn distributed representations for words and variable-length sentences. Feature vectors for words and sentences are learned based on their probability of co-occurrence between words and sentences using a neural network. To evaluate feature vectors learned by our model, we applied our model on the tasks of detecting word similarity and text summarization. Extensive experiments demonstrate the effectiveness of our proposed model in learning vector representations for words and sentences.
What problem does this paper attempt to address?