Bidirectional LSTM-CRF Models for Sequence Tagging

Zhiheng Huang,Wei Xu,Kai Yu

DOI: https://doi.org/10.48550/arXiv.1508.01991

2015-08-09

Abstract:In this paper, we propose a variety of Long Short-Term Memory (LSTM) based models for sequence tagging. These models include LSTM networks, bidirectional LSTM (BI-LSTM) networks, LSTM with a Conditional Random Field (CRF) layer (LSTM-CRF) and bidirectional LSTM with a CRF layer (BI-LSTM-CRF). Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets. We show that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a bidirectional LSTM component. It can also use sentence level tag information thanks to a CRF layer. The BI-LSTM-CRF model can produce state of the art (or close to) accuracy on POS, chunking and NER data sets. In addition, it is robust and has less dependence on word embedding as compared to previous observations.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the sequence - labeling tasks in natural language processing (NLP), specifically including part - of - speech tagging (POS), chunking, and named entity recognition (NER). The paper proposes a series of models based on long - short - term memory networks (LSTM), including unidirectional LSTM, bidirectional LSTM (BI - LSTM), LSTM with a conditional random field (CRF) layer (LSTM - CRF), and bidirectional LSTM with a CRF layer (BI - LSTM - CRF), in order to improve the performance of these tasks. The main contribution of the paper lies in applying the model that combines bidirectional LSTM with CRF (BI - LSTM - CRF) to NLP benchmark sequence - labeling datasets for the first time. This model can effectively utilize past input features (through forward states) and future input features (through backward states), and at the same time can also utilize sentence - level label information (through the CRF layer). Experimental results show that the BI - LSTM - CRF model achieves state - of - the - art or near - state - of - the - art accuracy on POS, chunking, and NER datasets, and this model has a low dependence on word embeddings and shows strong robustness.

Bidirectional LSTM-CRF Models for Sequence Tagging

Cross-Lingual Text Image Recognition Via Multi-Task Sequence to Sequence Learning.

Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data

Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?

Ancient Chinese Sentence Segmentation Based on Bidirectional LSTM+CRF Model

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Bidirectional LSTM with attention mechanism and convolutional layer for text classification

Multi-Task Cross-Lingual Sequence Tagging from Scratch

Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents

Legal Text Recognition Using LSTM-CRF Deep Learning Model

Embedded-State Latent Conditional Random Fields for Sequence Labeling

Incorporating Dictionary-Based Word Representation into Neural Network for Sequence Tagging

Long short-term memory RNN for biomedical named entity recognition

Hybrid Semi-Markov CRF for Neural Sequence Labeling.

Bidirectional LSTM-CRF Attention-based Model for Chinese Word Segmentation

Chinese Spelling Check Via Bidirectional Lstm-Crf

Post Text Processing of Chinese Speech Recognition Based on Bidirectional LSTM Networks and CRF

Applications of BERT Based Sequence Tagging Models on Chinese Medical Text Attributes Extraction

A Tree Search Algorithm for Sequence Labeling

Segment-Level Sequence Modeling Using Gated Recursive Semi-Markov Conditional Random Fields

Chinese Semantic Role Labeling with Bidirectional Recurrent Neural Networks