Abstract:Integrating lexical information into Chinese character embedding is a valid method to figure out the Chinese named entity recognition (NER) issue. However, most existing methods focus only on the discovery of named entity boundaries, considering only the words matched by the Chinese characters. They ignore the association between Chinese characters and their left and right matching words. They ignore the local semantic information of the character’s neighborhood, which is crucial for Chinese NER. The Chinese language incorporates a significant number of polysemous words, meaning that a single word can possess multiple meanings. Consequently, in the absence of sufficient contextual information, individuals may encounter difficulties in comprehending the intended meaning of a text, leading to the emergence of ambiguity. We consider how to handle the issue of entity ambiguity because of polysemous words in Chinese texts in different contexts more simply and effectively. We propose in this paper the use of graph attention networks to construct relatives among matching words and neighboring characters as well as matching words and adding left- and right-matching words directly using semantic information provided by the local lexicon. Moreover, this paper proposes a short-sequence convolutional neural network (SSCNN). It utilizes the generated shorter subsequence encoded with the sliding window module to enhance the perception of local information about the character. Compared with the widely used Chinese NER models, our approach achieves 1.18%, 0.29%, 0.18%, and 1.1% improvement on the four benchmark datasets Weibo, Resume, OntoNotes, and E-commerce, respectively, and proves the effectiveness of the model.

Learning Subword Embedding to Improve Uyghur Named-Entity Recognition.

A Neural Network Based Model for Loanword Identification in Uyghur

Bidirectional Long Short-Term Memory Network with a Conditional Random Field Layer for Uyghur Part-Of-Speech Tagging

Uyghur Morphological Segmentation with Bidirectional GRU Neural Networks

Learning Distributed Representations Of Uyghur Words And Morphemes

Toward Better Loanword Identification in Uyghur Using Cross-lingual Word Embeddings.

Improved Spoken Uyghur Segmentation for Neural Machine Translation

Sub-word Embedding Auxiliary Encoding in Mongolian-Chinese Neural Machine Translation

Research on Uyghur Morphological Segmentation Based on Long Sequence Labeling Method.

Enriching Urdu NER with BERT Embedding, Data Augmentation, and Hybrid Encoder-CNN Architecture

Learning Morpheme Representation for Mongolian Named Entity Recognition.

Neural Named Entity Recognition from Subword Units

Recurrent Neural Network Based Loanwords Identification in Uyghur

Hierarchical Lexicon Embedding Architecture for Chinese Named Entity Recognition

Character-Based N-gram Model for Uyghur Text Retrieval.

Mongolian Named Entity Recognition Using Suffixes Segmentation.

A Local Information Perception Enhancement–Based Method for Chinese NER

Mongolian Named Entity Recognition with Bidirectional Recurrent Neural Networks

Coreference Resolution of Uyghur Noun Phrases Based on Deep Learning

Unsupervised Learning and Linguistic Rule Based Algorithm for Uyghur Word Segmentation.

Improving Uyghur ASR systems with decoders using morpheme-based language models