A Novel Character-Word Fusion Chinese Named Entity Recognition Model Based on Attention Mechanism

Jinshang Luo,Xinchun Zou,Mengshu Hou
DOI: https://doi.org/10.1109/ccet55412.2022.9906333
2022-01-01
Abstract:Named Entity Recognition (NER) is a fundamental task in natural language processing. Compared with English NER, the difficulty of Chinese NER lies in word segmentation ambiguity and polysemy. Aiming at the issue, a novel character-word fusion Long Short-Term Memory (LSTM) model combined with the sentence-level attention mechanism (CWSA-LSTM) is proposed. Firstly, the method encodes the representations of characters and words through the pretrained models. The word information is incorporated into the character sequence by matching the potential word with a lexicon. Then the feature vectors are fed into the LSTM layer to learn contextual information. The attention mechanism is utilized to capture the tightness of the correlation in the sentence. Experiments on benchmark datasets demonstrate that CWSA-LSTM outperforms other state-of-the-art methods, and verify the effectiveness of character-word fusion. For the MSRA dataset, CWSA-LSTM achieves a 2.46% improvement in F1 score over baseline Lattice LSTM.
What problem does this paper attempt to address?