Recurrent Neural Network with Pooling Operation and Attention Mechanism for Sentiment Analysis: A Multi-Task Learning Approach

Yi Cai,Qingbao Huang,Zejun Lin,Jingyun Xu,Zhenhong Chen,Qing Li
DOI: https://doi.org/10.1016/j.knosys.2020.105856
IF: 8.139
2020-01-01
Knowledge-Based Systems
Abstract:Sentiment analysis is designed to classify documents into a fixed number of pre-defined categories that represent different sentiments. Focusing on the limitation of insufficient training data, multi-task learning models based on deep learning have recently achieved significant progress in this field. In general, these models leverage multiple datasets annotated for different tasks to improve the performance on each individual dataset. The improvement is particularly evident on tasks with limited training data. However, most of these models suffer from two limitations. First, they use the final output of the hidden layer as the overall representation of the text, which initially loses a certain amount of semantic information. Second, although some of them utilize a certain gate mechanism to select shared features, some irrelevant shared features are erroneously used owing to polysemy. To address these two limitations, we integrate a pooling layer into a Bi-directional Recurrent Neural Network (BRNN) to extract semantic information comprehensively. We then apply the attention mechanism between shared layers and task-specific layers to identify the effective shared features, and propose an Attention-based Separate Pooling BRNN (ASP-BRNN) model. We conduct experiments to show the effectiveness of our models on four datasets (SST1, SST2, SUBJ, and IMDB), and the accuracy of our models increases steadily by approximately 0.5% for each model. It proves the effectiveness of every newly added component in solving the two problems. A further evaluation on eight datasets shows our proposed ASP-BRNN model outperforms current state-of-the-art models, such as ASP-MTL model (at least +0.2% on Electronics and at most +6.9% on IMDB), MT-ARC-II model (at least +0.2% on SST2 and at most +3.8% on DVDs).
What problem does this paper attempt to address?