Exploring Segment Representations for Neural Semi-Markov Conditional Random Fields

Yijia Liu,Wanxiang Che,Bing Qin,Ting Liu
DOI: https://doi.org/10.1109/TASLP.2020.2964960
2020-01-01
Abstract:Many problems in natural language processing (NLP) can be cast as the problem of segmenting a sequence. In this article, we combine the semi-Markov conditional random fields (semi-CRF) with neural networks to solve NLP segmentation problems. We focus on the segment representation in neural semi-CRF which is important to the performance. Based on our preliminary work in Liu et al. [1], we represent a segment by both encoding the subsequence and embedding the segment string. We conduct a systematic study of the utility of various components in subsequence encoding and propose a method of constructing and deriving segment string embeddings. Extensive experiments on three typical segmentation problems, namely, shallow syntax parsing, named entity recognition, and Chinese word segmentation are conducted. The results show that we can achieve equally-performed subsequence encoding with a three times faster concatenation network compared to previous work. The results also show that the segment string embeddings help our neural semi-CRF model to achieve a macro-averaged error reduction of 13.15% over a strong baseline using deep contextualized embeddings and bidirectional long-short-term memory CRF, which also show the usefulness of semi-CRF even with contextualized embeddings. These results are competitive with the state-of-the-art segmentation systems.
What problem does this paper attempt to address?