GLexicon: Glyph and Lexicon-based Embedding Model for Chinese NER

Pengnian Qi,Peng Li,Biao Qin
DOI: https://doi.org/10.1109/icaibd57115.2023.10206317
2023-01-01
Abstract:Recently, many works have proved to be effective for Chinese named recognition (NER) by incorporating the word lexicons. However, previous works fuse lexicon information and ignore two important Chinese language characteristics: glyph and pinyin, which carry significant syntax and semantics information for sequence tagging tasks. This paper proposes GLexicon, which incorporates both lexicon features and Chinese language features (glyph and pinyin) into the character-based NER model. GLexicon entirely leverages the lexicon information and deeply fuses Chinese glyph and pinyin features compared with existing methods. Specifically, we first design the embedding scheme to preserve the lexicon-matching results as many as possible in character representations. Then, we deeply fuse the Chinese glyph and pinyin information into a lexicon-based model to enhance the expressiveness of the Chinese NER task. The experimental results on four Chinese public NER datasets show that GLexicon can achieve state-of-the-art performance.
What problem does this paper attempt to address?