Densely-connected Neural Networks for Aspect Term Extraction
Chen,Houfeng Wang,Qiuming Zhu,Junfei Liu
DOI: https://doi.org/10.1007/s11432-019-2775-9
2021-01-01
Abstract:Aspect term extraction (ATE) is a sub-task of aspect-based sentiment analysis, which aims to extract opinionated aspect terms from user reviews. For example, in a laptop domain review: “Boot time is super fast”, boot time is an aspect, and the sentiment towards it is positive, which can be inferred from super fast. Existing approaches to solving the ATE task could be categorized as unsupervised, weakly supervised, and supervised. Research on supervised methods usually treats the task as a token-level sequence labeling problem and focuses on extracting features. Recently, automated feature learning by deep neural networks is preferred because it is difficult to obtain useful features manually [1–3]. Most models, however, only use the representations of the last layer as features for prediction. Peters et al. [4] showed that the representations vary with network depth in neural networks: the morphological information is encoded at the word embedding layer; the local syntax is captured at lower layers; the longer-range semantics are encoded at the upper layers. Thus, a traditional multi-layer neural model could lose the low-level features, while a single-layer neural model cannot obtain the high-level features. Inspired by [5], we propose a densely-connected multilayer neural network model for ATE that can combine the features from each layer. Specifically, the model contains three components (1) The double embedding mechanism combines general embeddings and domain embeddings to improve the quality of embeddings; (2) The multi-layer BiLSTM networks process the inputs in forward and backward directions to generate the token-level representations by recording the sequential information; (3) The self-attention mechanism conducts direct connections between two words in a sentence and provides a more flexible way to represent the context dependency features to complement BiLSTMs. Last, we concatenate the representations learned by all preceding layers as the final features for extracting aspect terms. Methodology. Given a review sentence comprising of a sequence of tokens by s = {w1, w2, . . . , wT }, where T is the number of tokens, we aim to predict an aspect label sequence y = {y1, y2, . . . , yT } for s. Each token wt is classified as yt that comes from a finite label set Y = {B, I,O}. B, I, and O represent the beginning of an aspect term, the inside of an aspect term, and non-aspect words, respectively. The specific steps of the model are described as follows. Step 1. Double embedding mechanism. Each token wt in the review sentence gets its two corresponding continuous representations based on two pre-trained embedding matrices. One is a general embedding xgt , and the other is a domain-specific embedding xt . The scope of the domain embeddings coincides with the domain to which the datasets belong. Because the domain and global embeddings are trained with different datasets individually, their embedding spaces differ. To preserve their context features for label prediction, we build the initial token-level contextualized representations with the global and domain embeddings based on BiLSTMs [6], respectively. Let LSTM denote an LSTM unit, S ∈ {G,D} is the task indicator, and G and D are the notations for global and domain tasks, respectively. Below is the calculation process.