Unsupervised Leraning for Sematic Representation of Short Text.

Chenxi Dong,Haoran Jia,Cong Wang
DOI: https://doi.org/10.1109/ccis.2018.8691363
2018-01-01
Abstract:Learning generic text embeddings is still a challenge in the field of many natural language modeling applications. For learning generic text embeddings, we propose a new model to learn short text representations that can be used for multiple purposes. The model consists of two convolutional neural networks: one is responsible for extracting the semantic representations of short text which words are normal order, and the other is learning the representations of short text which is in reverse order. We are committed to minimizing the difference between the two representations. What’s more, we assume that the posterior approximation of the semantic representations of short text is Gaussian, we minimize the KL-divergence to map semantic representations into low-dimensional spaces with Gaussian distributions. Based on a Chinese text classification dataset, the experiments show that our model achieves higher scores than baseline systems and learns more semantic representations without supervision.
What problem does this paper attempt to address?