Discriminative Neural Sentence Modeling by Tree-Based Convolution

Lili Mou,Hao Peng,Ge Li,Yan Xu,Lu Zhang,Zhi Jin
DOI: https://doi.org/10.48550/arXiv.1504.01106
2015-06-02
Abstract:This paper proposes a tree-based convolutional neural network (TBCNN) for discriminative sentence modeling. Our models leverage either constituency trees or dependency trees of sentences. The tree-based convolution process extracts sentences' structural features, and these features are aggregated by max pooling. Such architecture allows short propagation paths between the output layer and underlying feature detectors, which enables effective structural feature learning and extraction. We evaluate our models on two tasks: sentiment analysis and question classification. In both experiments, TBCNN outperforms previous state-of-the-art results, including existing neural networks and dedicated feature/rule engineering. We also make efforts to visualize the tree-based convolution process, shedding light on how our models work.
Computation and Language,Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to effectively capture sentence - structure features in sentence modeling so as to improve the performance of sentiment analysis and question - classification tasks. Traditional neural - network models, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have certain limitations when processing natural languages. Although CNNs can effectively extract local features, they cannot well capture the internal - structure information of sentences; while RNNs can encode structure information to a certain extent, but due to the existence of long - propagation paths, information loss or training difficulties may be caused. To solve these problems, the paper proposes a Tree - Based Convolutional Neural Network (TBCNN), which extracts the structure features of sentences by using the syntactic trees of sentences (such as constituent trees or dependency trees) and aggregates these features through max pooling. This architecture allows short - propagation paths from the bottom - level feature detectors to the output layer, so that it can learn and extract structure features more effectively. The paper verifies the superior performance of TBCNN in sentiment - analysis and question - classification tasks through experiments, which exceeds the existing state - of - the - art methods.