Abstract:Integrating lexical information into Chinese character embedding is a valid method to figure out the Chinese named entity recognition (NER) issue. However, most existing methods focus only on the discovery of named entity boundaries, considering only the words matched by the Chinese characters. They ignore the association between Chinese characters and their left and right matching words. They ignore the local semantic information of the character’s neighborhood, which is crucial for Chinese NER. The Chinese language incorporates a significant number of polysemous words, meaning that a single word can possess multiple meanings. Consequently, in the absence of sufficient contextual information, individuals may encounter difficulties in comprehending the intended meaning of a text, leading to the emergence of ambiguity. We consider how to handle the issue of entity ambiguity because of polysemous words in Chinese texts in different contexts more simply and effectively. We propose in this paper the use of graph attention networks to construct relatives among matching words and neighboring characters as well as matching words and adding left- and right-matching words directly using semantic information provided by the local lexicon. Moreover, this paper proposes a short-sequence convolutional neural network (SSCNN). It utilizes the generated shorter subsequence encoded with the sliding window module to enhance the perception of local information about the character. Compared with the widely used Chinese NER models, our approach achieves 1.18%, 0.29%, 0.18%, and 1.1% improvement on the four benchmark datasets Weibo, Resume, OntoNotes, and E-commerce, respectively, and proves the effectiveness of the model.

2kenize: Tying Subword Sequences for Chinese Script Conversion

Sub-Character Tokenization for Chinese Pretrained Language Models

Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

A Hybrid Word-Character Approach to Abstractive Summarization

A Local Information Perception Enhancement–Based Method for Chinese NER

Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition.

A Chinese text classification model based on radicals and character distinctions

When is Char Better Than Subword: A Systematic Study of Segmentation Algorithms for Neural Machine Translation

Multi-level Linguistic Knowledge Based Chinese Grapheme-to-Phoneme Conversion.

Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling

A Deep Convolutional Neural Model for Character-Based Chinese Word Segmentation

SubCharacter Chinese-English Neural Machine Translation with Wubi encoding

Braille to Print Translations for Chinese

Korean-to-Chinese Machine Translation using Chinese Character as Pivot Clue

CharSS: Character-Level Transformer Model for Sanskrit Word Segmentation

A New Approach to Accent Recognition and Conversion for Mandarin Chinese

Using Chinese Glyphs for Named Entity Recognition

Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models

Chinese Syllable-to-character Conversion with Recurrent Neural Network Based Supervised Sequence Labelling

MCSSpell:Optimal Path Selection of Candidate Characters by Integrating Multimodal Information and Copy Mechanism for Chinese Spelling Correction.

Sense-Aware Decoder for Character Based Japanese-Chinese NMT