Semantic-based Pre-training for Dialogue Understanding

Xuefeng Bai,Linfeng Song,Yue Zhang
DOI: https://doi.org/10.48550/arXiv.2209.09146
2022-09-20
Abstract:Pre-trained language models have made great progress on dialogue tasks. However, these models are typically trained on surface dialogue text, thus are proven to be weak in understanding the main semantic meaning of a dialogue context. We investigate Abstract Meaning Representation (AMR) as explicit semantic knowledge for pre-training models to capture the core semantic information in dialogues during pre-training. In particular, we propose a semantic-based pre-training framework that extends the standard pre-training framework (Devlin et al., 2019) by three tasks for learning 1) core semantic units, 2) semantic relations and 3) the overall semantic representation according to AMR graphs. Experiments on the understanding of both chit-chats and task-oriented dialogues show the superiority of our model. To our knowledge, we are the first to leverage a deep semantic representation for dialogue pre-training.
Computation and Language
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper "Semantic - based Pretraining for Dialogue Understanding" attempts to address the shortcomings of existing pretrained language models in dialogue tasks. In particular, these models are usually trained only at the surface level of dialogue texts, and thus perform weakly in understanding the core semantic information of dialogue contexts. The author proposes a new pretraining framework that uses Abstract Meaning Representation (AMR) as explicit semantic knowledge to capture the core semantic information in dialogues. ### Main contributions 1. **Introducing AMR as explicit semantic knowledge**: - The author uses AMR graphs to represent the core semantic units and their relationships in dialogues, thereby enhancing the model's semantic understanding ability during the pretraining stage. 2. **Proposing a semantically - guided pretraining framework**: - This framework extends the standard pretraining framework by adding three tasks: - **Semantics - guided Masking Language Modeling**: Improves the masking language modeling task by focusing on core semantic units. - **Semantic Relation Prediction**: Learns the semantic relationships between words. - **Semantic Agreement**: Optimizes the overall similarity between the dialogue and its corresponding AMR graph. 3. **Experimental verification**: - The experimental results show that this framework outperforms existing pretraining methods in both small - talk dialogue and task - oriented dialogue understanding tasks. In particular, it achieves new state - of - the - art results when using less training data. ### Method overview 1. **Semantics - guided Masking Language Modeling**: - Align important semantic units in the text through AMR nodes and give them a higher masking probability, so that the model pays more attention to these units. 2. **Semantic Relation Prediction**: - Project the edges in the AMR graph into the text and train a predictor to generate these edges, thereby learning the semantic relationships between words. 3. **Semantic Consistency Optimization**: - Use an auxiliary network to encode the AMR graph and maximize the similarity between the text hidden state and the AMR graph hidden state to ensure that the model can understand the semantics of the dialogue as a whole. ### Experimental results - **Dialogue relation extraction task**: - The F1 scores of SARA - BERT on the two test sets are 1.4 and 2.2 points higher than those of BERT c respectively, showing the effectiveness of the semantic pretraining framework. - **DialoGLUE benchmark test**: - SARA - BERT outperforms BERT on all 7 data sets, with an average improvement of 1.1 percentage points. In particular, on the HWU 64 and MULTI WOZ data sets, it is 2.1 and 3.0 percentage points higher than BERT respectively. ### Conclusion The AMR - based semantic pretraining framework proposed in this paper significantly improves the semantic understanding ability of dialogue systems, especially when dealing with complex dialogues. This framework not only achieves new state - of - the - art results in multiple dialogue tasks, but also shows high data efficiency when using less training data.