Abstract:Integrating lexical information into Chinese character embedding is a valid method to figure out the Chinese named entity recognition (NER) issue. However, most existing methods focus only on the discovery of named entity boundaries, considering only the words matched by the Chinese characters. They ignore the association between Chinese characters and their left and right matching words. They ignore the local semantic information of the character’s neighborhood, which is crucial for Chinese NER. The Chinese language incorporates a significant number of polysemous words, meaning that a single word can possess multiple meanings. Consequently, in the absence of sufficient contextual information, individuals may encounter difficulties in comprehending the intended meaning of a text, leading to the emergence of ambiguity. We consider how to handle the issue of entity ambiguity because of polysemous words in Chinese texts in different contexts more simply and effectively. We propose in this paper the use of graph attention networks to construct relatives among matching words and neighboring characters as well as matching words and adding left- and right-matching words directly using semantic information provided by the local lexicon. Moreover, this paper proposes a short-sequence convolutional neural network (SSCNN). It utilizes the generated shorter subsequence encoded with the sliding window module to enhance the perception of local information about the character. Compared with the widely used Chinese NER models, our approach achieves 1.18%, 0.29%, 0.18%, and 1.1% improvement on the four benchmark datasets Weibo, Resume, OntoNotes, and E-commerce, respectively, and proves the effectiveness of the model.

CMRight: Chinese Morph Resolution Based on End-to-end Model Combined with Enhancement Algorithms

Morpholog-Processing in Chinese-Mongolian Statistical Machine Translation

Semorph: A Morphology Semantic Enhanced Pre-trained Model for Chinese Spam Text Detection.

A Local Information Perception Enhancement–Based Method for Chinese NER

MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

A Phrase-Based Statistical Model for SMS Text Normalization

A Hybrid Model for Computational Morphology Application

Text Representation Model for Multiple Language Forms in Spoken Chinese Expression

Context-Aware Entity Morph Decoding

Named entity translation method based on machine translation lexicon

Combine CRF and MMSEG to Boost Chinese Word Segmentation in Social Media

A morphology-based Chinese word segmentation method

Chinese-English Smt for Cross-Language Dialogue Agent Support

EmojiLM: Modeling the New Emoji Language

A text matching model based on dynamic multi‐mask and augmented adversarial

Input Normalization for an English-to-Chinese SMS Translation System

Automatic Recognition of Chinese Unknown Word for Single-Character and Affix Models

Uyghur-Chinese statistical machine translation by incorporating morphological information

Kcr-FLAT: A Chinese-Named Entity Recognition Model with Enhanced Semantic Information

SSMI: Semantic Similarity and Mutual Information Maximization Based Enhancement for Chinese NER.

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition