Convolutional Neural Networks for Sentence Classification

Yoon Kim
DOI: https://doi.org/10.48550/arXiv.1408.5882
2014-09-03
Abstract:We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.
Computation and Language,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the performance of the model in sentence - level classification tasks. Specifically, the author explores the method of using convolutional neural networks (CNN) combined with pre - trained word vectors to handle sentence classification tasks in natural language processing (NLP). The paper shows through a series of experiments that even with very little adjustment of hyper - parameters, a simple CNN model can achieve excellent results in multiple benchmark tests. In addition, by fine - tuning task - specific word vectors, the model performance is further improved. The paper also proposes a simple method that allows the simultaneous use of pre - trained static word vectors and task - specific non - static word vectors to enhance the expressiveness of the model. ### Main Research Questions 1. **Evaluating the Effect of Pre - trained Word Vectors**: Research on the performance of pre - trained word vectors in sentence classification tasks, especially in the absence of a large amount of labeled data. 2. **The Influence of Fine - tuning Word Vectors**: Explore whether fine - tuning pre - trained word vectors can further improve the model performance. 3. **The Effectiveness of Multi - channel Architectures**: Propose and verify a multi - channel CNN architecture that uses both static and non - static word vectors simultaneously to evaluate its effect in preventing over - fitting and improving model performance. ### Experimental Setup - **Data Sets**: The paper uses multiple data sets for experiments, including movie reviews (MR), Stanford Sentiment Treebank (SST - 1 and SST - 2), Subjectivity data set (Subj), TREC Question Classification data set, customer reviews (CR) and MPQA Opinion Polarity Detection subtask. - **Model Variants**: - **CNN - rand**: All word vectors are randomly initialized and updated during the training process. - **CNN - static**: Use pre - trained word vectors and keep them unchanged throughout the training process. - **CNN - non - static**: Use pre - trained word vectors and fine - tune them during the training process. - **CNN - multichannel**: A multi - channel model that uses both static and non - static word vectors simultaneously. ### Main Findings - **The Importance of Pre - trained Word Vectors**: Even a simple CNN model can achieve very good results in multiple benchmark tests by using pre - trained static word vectors. - **The Gain from Fine - tuning**: By fine - tuning pre - trained word vectors, the model performance can be further improved. - **The Mixed Effect of Multi - channel Models**: Multi - channel models perform well on some data sets, but not as well as single - channel models on other data sets, indicating that further research is needed on how to better utilize multi - channel architectures. ### Conclusion The paper proves through experiments that a simple CNN model using pre - trained word vectors performs well in sentence classification tasks, and the performance can be further improved by fine - tuning these word vectors. Although the multi - channel model theoretically helps prevent over - fitting, the actual effect is inconsistent and more research is needed to optimize it. Overall, pre - trained word vectors are an indispensable part of deep learning in natural language processing.