Abstract:Conditional Random Field (CRF) based neural models are among the most performant methods for solving sequence labeling problems. Despite its great success, CRF has the shortcoming of occasionally generating illegal sequences of tags, e.g. sequences containing an "I-" tag immediately after an "O" tag, which is forbidden by the underlying BIO tagging scheme. In this work, we propose Masked Conditional Random Field (MCRF), an easy to implement variant of CRF that impose restrictions on candidate paths during both training and decoding phases. We show that the proposed method thoroughly resolves this issue and brings consistent improvement over existing CRF-based models with near zero additional cost.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the Conditional Random Field (CRF) model, illegal label sequences that violate the rules of label encoding schemes such as BIO or BIOES are occasionally generated. For example, in the BIO encoding scheme, it is not allowed for an "O" label to be directly followed by an "I -" label. Such illegal paths will not only lead to prediction errors but also affect the overall performance of the model.
### Specific manifestations of the problem
1. **Definition of illegal paths**:
- In the BIO encoding scheme, certain label transitions are prohibited, such as "O I - LOC".
- In the BIOES encoding scheme, the label transition rules are more stringent. For example, an "I -" label must be followed by a "B -" or "I -" label of the same type and must be ended by an "E -" label.
2. **Shortcomings of existing methods**:
- Existing methods usually rely on manually - designed post - processing steps to repair illegal paths, such as relabeling illegal fragments.
- This treatment method is arbitrary and will lead to sub - optimal performance.
### Solution
To solve the above problems, the paper proposes Masked Conditional Random Field (MCRF), that is, masked conditional random field. MCRF fundamentally avoids the generation of illegal paths by imposing restrictions on candidate paths during the training and decoding stages.
### Main improvement points of MCRF
1. **Training stage**:
- Modify the loss function so that only legal paths are normalized, thereby avoiding the influence of illegal paths.
- The new loss function is:
\[
L'(W, A) := -\frac{1}{|S|} \sum_{(x,y) \in S} \log \frac{\exp(s(y,x))}{\sum_{p \in P/I} \exp(s(p,x))}
\]
- Where \(P/I\) represents the space of all legal paths.
2. **Decoding stage**:
- When decoding, only search for the optimal path within the legal path space.
- The optimal path is:
\[
y'_{\text{opt}} = \arg\max_{p \in P/I} s(p, x_{\text{test}}, W'_{\text{opt}}, A'_{\text{opt}})
\]
3. **Implementation details**:
- Use a mask matrix \(\bar{A}(c)\) to mask illegal transitions, where \(c \ll 0\) is a very small constant.
- After each parameter update, keep the weight of illegal transitions as \(c\).
### Experimental results
The paper verifies the effectiveness of MCRF through experiments on multiple datasets:
- **Chinese Named Entity Recognition (NER)**: MCRF has achieved new best results on multiple Chinese NER datasets.
- **Slot - filling task**: MCRF significantly outperforms the baseline model on the ATIS and SNIPS datasets.
- **Chunking task**: MCRF also performs well on the CoNLL2000 chunking task.
In conclusion, MCRF completely solves the illegal path problem and significantly improves the performance of the model by introducing a path - masking mechanism during the training and decoding stages.