Abstract:This paper proposes an improved neural network structure based on LSTM for recognizing entities in sentences. The deep learning model of LSTM has a label offset problem when performing sequence labeling tasks, because it does not utilize the information of the model's output layer. Therefore, this paper adds a layer of neurons on the output layer of the model to simulate the hidden state in the CRF to make full use of the label information contained in the output layer. On the other hand, the LSTM-based sequence labeling model cannot ensure that the recognized entities exist in the knowledge base. Besides, it has difficulty in word boundary recognition, so this paper introduces the entity dictionary in the model. Finally, part of speech is the core feature of the entity. The char vector in the LSTM-based sequence labeling model lacks part of speech information, so this paper modified the character vector incorporating the part of speech feature information. Experiments show that the improved LSTM-based model has achieved good results in entity mention recognition.

Research on Entity Mention Recognition Based on LSTM