Abstract:Semantic role labeling (SRL) aims to identify the predicate-argument structure of a sentence. Inspired by the strong correlation between syntax and semantics, previous works pay much attention to improve SRL performance on exploiting syntactic knowledge, achieving significant results. Pipeline methods based on automatic syntactic trees and multi-task learning (MTL) approaches using standard syntactic trees are two common research orientations. In this paper, we adopt a simple unified span-based model for both span-based and word-based Chinese SRL as a strong baseline. Besides, we present a MTL framework that includes the basic SRL module and a dependency parser module. Different from the commonly used hard parameter sharing strategy in MTL, the main idea is to extract implicit syntactic representations from the dependency parser as external inputs for the basic SRL model. Experiments on the benchmarks of Chinese Proposition Bank 1.0 and CoNLL-2009 Chinese datasets show that our proposed framework can effectively improve the performance over the strong baselines. With the external BERT representations, our framework achieves new state-of-the-art 87.54 and 88.5 F1 scores on the two test data of the two benchmarks, respectively. In-depth analysis are conducted to gain more insights on the proposed framework and the effectiveness of syntax.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance improvement of Chinese Semantic Role Labeling (SRL). Specifically, the author aims to improve the performance of the SRL model by introducing syntactic information, especially when dealing with long - distance dependencies and different semantic roles.
### Problem Background
Semantic Role Labeling (SRL) is an important task in natural language processing. The goal is to identify the predicate - argument structure in a sentence (i.e., "who did what, to whom, when and where", etc.). For Chinese SRL, there are relatively few existing studies, mainly due to the limited amount of data and the insufficient attention of Chinese researchers. Common Chinese SRL datasets include Chinese Proposition Bank 1.0 (CPB1.0) and the CoNLL - 2009 Chinese dataset.
### Core Problems of the Paper
1. **Introducing Syntactic Information**: How to effectively integrate syntactic information into the SRL model to improve its performance.
2. **Multi - task Learning Framework**: Design a multi - task learning (MTL) framework so that the SRL model can utilize the implicit syntactic representations provided by the dependency syntactic parser.
3. **Unified Model**: Propose a simple unified model suitable for span - based and word - based Chinese SRL tasks.
### Solutions
To achieve the above goals, the author proposes the following solutions:
1. **Simple Unified SRL Model**:
- Propose a unified model that can handle both span - based and word - based Chinese SRL tasks simultaneously.
2. **Multi - task Learning Framework**:
- Construct an MTL framework that includes a basic SRL module and a dependency syntactic parsing module.
- Different from the common hard parameter - sharing strategy, this framework extracts implicit syntactic representations from the dependency syntactic parser and provides them as external inputs to the basic SRL model.
3. **Experimental Verification**:
- Experiments were carried out on two benchmark datasets, CPB1.0 and CoNLL - 2009. The results show that this framework is significantly superior to the baseline model, achieving F1 scores of 87.54 and 88.5 respectively.
### Formula Representation
In the paper, some formulas are involved to describe the calculation process of the model, for example:
- The goal of the model is to optimize the probability of the predicate - argument - role tuple \(y\in Y\) in sentence \(s\). The formula is as follows:
\[
P(y|s)=\prod_{p\in P,a\in A,r\in R}P(y(p,a,r)|s)=\prod_{p\in P,a\in A,r\in R}\frac{e^{\phi(p,a,r)}}{\sum_{r'\in R}e^{\phi(p,a,r')}}
\]
where \(\phi(p,a,r)=\phi_p(p)+\phi_a(a)+\phi_r(p,a)\) is the score of the predicate - argument - relation tuple.
- Input representation formula:
\[
x_i = \text{repchar}_i\oplus\text{embword}_i\oplus\text{repBERT}_i
\]
where \(\oplus\) represents the concatenation operation.
Through these methods, the author has successfully improved the performance of Chinese SRL tasks and provided new ideas and directions for future research.