LIMIT-BERT : Linguistic Informed Multi-Task BERT

Junru Zhou,Zhuosheng Zhang,Hai Zhao,Shuailiang Zhang
DOI: https://doi.org/10.48550/arXiv.1910.14296
2020-10-06
Abstract:In this paper, we present a Linguistic Informed Multi-Task BERT (LIMIT-BERT) for learning language representations across multiple linguistic tasks by Multi-Task Learning (MTL). LIMIT-BERT includes five key linguistic syntax and semantics tasks: Part-Of-Speech (POS) tags, constituent and dependency syntactic parsing, span and dependency semantic role labeling (SRL). Besides, LIMIT-BERT adopts linguistics mask strategy: Syntactic and Semantic Phrase Masking which mask all of the tokens corresponding to a syntactic/semantic phrase. Different from recent Multi-Task Deep Neural Networks (MT-DNN) (Liu et al., 2019), our LIMIT-BERT is linguistically motivated and learning in a semi-supervised method which provides large amounts of linguistic-task data as same as BERT learning corpus. As a result, LIMIT-BERT not only improves linguistic tasks performance but also benefits from a regularization effect and linguistic information that leads to more general representations to help adapt to new tasks and domains. LIMIT-BERT obtains new state-of-the-art or competitive results on both span and dependency semantic parsing on Propbank benchmarks and both dependency and constituent syntactic parsing on Penn Treebank.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to integrate linguistic knowledge (such as syntactic and semantic information) into pre - trained language representation models through multi - task learning (MTL) and semi - supervised learning strategies, in order to improve the performance of the model on various linguistic tasks. Specifically, the paper proposes Linguistics Informed Multi - Task BERT (LIMIT - BERT), and this model is trained on the following five key linguistic tasks: 1. **Part - Of - Speech (POS) Tagging**: Identify the part - of - speech of each word in a sentence. 2. **Constituent Syntactic Parsing**: Construct a constituent - based syntactic tree to represent the syntactic structure of a sentence. 3. **Dependency Syntactic Parsing**: Identify the dependency relationships between pairs of words in a sentence. 4. **Span - based Semantic Role Labeling (SRL)**: Identify the predicate - argument structures in a sentence, such as "who did what", "to whom it was done", etc. 5. **Dependency - based SRL**: Similar to span - based SRL, but label the syntactic heads of arguments instead of the entire argument range. ### Main Contributions 1. **Linguistically - Driven Multi - Task Learning**: LIMIT - BERT is a fully linguistically - driven multi - task model that can adopt an improved masking training objective according to syntactic and semantic components. 2. **Semi - Supervised Learning Strategy**: By using pre - trained linguistic models to label a large amount of unlabeled text data and combining it with gold - standard linguistic task data, to alleviate the data imbalance problem in multi - task learning. 3. **Improved Masking Strategy**: Introduce Syntactic Phrase Masking (SPM) and Semantic Phrase Masking (SPM), as well as Whole Word Masking (WWM), to improve model performance. 4. **Extensive Experimental Verification**: Conducted extensive experiments on multiple benchmark datasets, including CoNLL - 2005, CoNLL - 2009, Penn Treebank (PTB), GLUE benchmark and SNLI tasks, which prove the effectiveness and superiority of LIMIT - BERT. ### Experimental Results - **Syntactic Analysis**: On the PTB dataset, LIMIT - BERT achieved an F1 score of 95.84 (constituent syntactic analysis), 97.14% UAS (undirected dependency syntactic analysis) and 95.44% LAS (directed dependency syntactic analysis) without fine - tuning. - **Semantic Parsing**: On the CoNLL - 2005 and CoNLL - 2009 datasets, LIMIT - BERT outperformed the baseline model BERT WWM in the end - to - end mode, especially on the dependency SRL task, with an F1 score improvement of 0.7. - **Natural Language Understanding Tasks**: On the GLUE benchmark and SNLI tasks, LIMIT - BERT also outperformed the baseline model BERT WWM. ### Conclusion LIMIT - BERT successfully integrates linguistic knowledge into pre - trained language representation models through multi - task learning and semi - supervised learning strategies, significantly improving the performance of the model on various linguistic tasks. This shows that the introduction of linguistic information not only helps to improve specific tasks, but also enhances the generalization ability of the model, making it better adapt to new tasks and domains.