Neural Probabilistic Model for Non-projective MST Parsing

Xuezhe Ma,Eduard Hovy
DOI: https://doi.org/10.48550/arXiv.1701.00874
2017-09-04
Abstract:In this paper, we propose a probabilistic parsing model, which defines a proper conditional probability distribution over non-projective dependency trees for a given sentence, using neural representations as inputs. The neural network architecture is based on bi-directional LSTM-CNNs which benefits from both word- and character-level representations automatically, by using combination of bidirectional LSTM and CNN. On top of the neural network, we introduce a probabilistic structured layer, defining a conditional log-linear model over non-projective trees. We evaluate our model on 17 different datasets, across 14 different languages. By exploiting Kirchhoff's Matrix-Tree Theorem (Tutte, 1984), the partition functions and marginals can be computed efficiently, leading to a straight-forward end-to-end model training procedure via back-propagation. Our parser achieves state-of-the-art parsing performance on nine datasets.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the effective modeling of non - projective dependency parsing in natural language processing. Specifically, the author proposes a neural - network - based probabilistic model, aiming to automatically benefit from word - level and character - level representations by using a bidirectional LSTM - CNNs architecture, and builds a probabilistic structure layer on top of these neural representations, defining the conditional distribution of all non - projective dependency trees on a given sentence. This model utilizes Kirchhoff's Matrix - Tree Theorem and can efficiently calculate the partition function and marginal probabilities, thus enabling end - to - end model training through back - propagation. Moreover, the evaluation of this model on 17 different datasets and 14 different languages shows that it achieves the state - of - the - art parsing performance on nine datasets. In short, the main goal of the paper is to develop a non - projective dependency parser that can work effectively in multiple languages. In particular, when dealing with languages or domains that are difficult to handle with traditional methods, it can reduce the dependence on hand - crafted features and task - specific resources and improve the adaptability and performance of the model.