A Text Representation Model Based on Convolutional Neural Network and Variational Auto Encoder.

Canyang Guo,Lin Xie,Genggeng Liu,Xin Wang
DOI: https://doi.org/10.1007/978-3-030-60029-7_21
2020-01-01
Abstract:In the era of big data, the internet produces vast amounts of data every day, among which text data occupies the main position. It is difficult for manual processing to deal with the increasing growth rate of text data. As basis of most natural language processing (NLP) tasks, text representation aims to transform text into a vector that can be processed by computer without losing the original important semantic information. It has become an important research direction in the field of NLP that effectively organize, manage and quickly use the complex text information to extract useful semantics from it. Therefore, a text feature representation model based on convolutional neural network (CNN) and variational auto encoder (VAE) is proposed to extract the text features and apply the obtained text feature representation to text classification scene. CNN is used to extract local features and VAE makes the extracted features more consistent with Gaussian distribution. The proposed method has best performance compared with w2v-avg and CNN-AE in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
What problem does this paper attempt to address?