Neural CRF transducers for sequence labeling

Kai Hu,Zhijian Ou,Min Hu,Junlan Feng
DOI: https://doi.org/10.48550/arXiv.1811.01382
2018-11-04
Abstract:Conditional random fields (CRFs) have been shown to be one of the most successful approaches to sequence labeling. Various linear-chain neural CRFs (NCRFs) are developed to implement the non-linear node potentials in CRFs, but still keeping the linear-chain hidden structure. In this paper, we propose NCRF transducers, which consists of two RNNs, one extracting features from observations and the other capturing (theoretically infinite) long-range dependencies between labels. Different sequence labeling methods are evaluated over POS tagging, chunking and NER (English, Dutch). Experiment results show that NCRF transducers achieve consistent improvements over linear-chain NCRFs and RNN transducers across all the four tasks, and can improve state-of-the-art results.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that existing Linear - chain Conditional Random Fields (Linear - chain CRFs) and their neural network extended versions (NCRFs) can only capture the first - order dependency relationships between labels when handling sequence - labeling tasks, while ignoring potential long - distance dependency relationships. This limitation may lead to poor performance in practical applications. Specifically, the author proposes a new model - **Neural CRF Transducers (NCRF transducers)**. This model improves existing methods by introducing two Recurrent Neural Networks (RNNs): 1. **Feature - extraction RNN**: Extracts features from the input sequence. 2. **Prediction RNN**: Captures long - distance dependency relationships between labels. In this way, the NCRF transducer can effectively model long - distance dependency relationships between labels while maintaining global normalization, thereby improving performance on multiple sequence - labeling tasks. ### Main contributions - **Introduction of long - distance dependency modeling**: Compared with linear - chain NCRFs, the NCRF transducer can capture long - distance dependency relationships between labels and can theoretically model dependencies of infinite length. - **Global normalization**: Unlike locally - normalized RNN transducers, the NCRF transducer is globally - normalized, avoiding label - bias and exposure - bias problems. - **Experimental verification**: Through Part - of - Speech (POS) tagging, chunking, and Named Entity Recognition (NER) tasks in English and Dutch, it is proved that the NCRF transducer has consistent improvements on these tasks and has achieved state - of - the - art results. ### Experimental results The experimental results show that the performance of the NCRF transducer is better than that of linear - chain NCRFs and RNN transducers in all four tasks, especially achieving a significant improvement in the Named Entity Recognition task. For example, in the CoNLL - 2003 English NER task, the NCRF transducer has achieved an F1 score of 92.36, exceeding the previous best result. Overall, this paper aims to improve existing sequence - labeling methods by introducing long - distance dependency modeling and global normalization, thereby enhancing the performance of the model in various natural language processing tasks.