Abstract:Integrating lexical information into Chinese character embedding is a valid method to figure out the Chinese named entity recognition (NER) issue. However, most existing methods focus only on the discovery of named entity boundaries, considering only the words matched by the Chinese characters. They ignore the association between Chinese characters and their left and right matching words. They ignore the local semantic information of the character’s neighborhood, which is crucial for Chinese NER. The Chinese language incorporates a significant number of polysemous words, meaning that a single word can possess multiple meanings. Consequently, in the absence of sufficient contextual information, individuals may encounter difficulties in comprehending the intended meaning of a text, leading to the emergence of ambiguity. We consider how to handle the issue of entity ambiguity because of polysemous words in Chinese texts in different contexts more simply and effectively. We propose in this paper the use of graph attention networks to construct relatives among matching words and neighboring characters as well as matching words and adding left- and right-matching words directly using semantic information provided by the local lexicon. Moreover, this paper proposes a short-sequence convolutional neural network (SSCNN). It utilizes the generated shorter subsequence encoded with the sliding window module to enhance the perception of local information about the character. Compared with the widely used Chinese NER models, our approach achieves 1.18%, 0.29%, 0.18%, and 1.1% improvement on the four benchmark datasets Weibo, Resume, OntoNotes, and E-commerce, respectively, and proves the effectiveness of the model.

Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures

Chinese Word Extraction Based on the Internal Associative Strength of Character Strings

Word extraction based on semantic constraints in chinese word-formation

Accessor variety criteria for Chinese word extraction

Disyllabic Chinese Word Extraction Based on Character Thesaurus and Semantic Constraints in Word-Formation

A Local Information Perception Enhancement–Based Method for Chinese NER

Automatic Extraction of Multiword Expressions Combining Statistical and Similarity Approaches

Improved Word Similarity Computation for Chinese Using Sub-word Information

COMPARISON AND COMBINATION OF COFIDENCE MEASURES IN ISOLATE WORD RECOGNITION

Research on Automatic Chinese Multi-word Term Extraction Based on Term Component

Automatic Extraction and Filtration of Multiword Units1.

Chinese Keyword Extraction Based on N-Gram and Word Co-occurrence

Chinese Word Similarity Computing Based on Combination Strategy

A Discriminative Latent Variable Chinese Segmenter with Hybrid Word/Character Information.

Application-Oriented Comparison and Evaluation of Six Semantic Similarity Measures Based on Wordnet

Optimal evaluation of feature extraction method based on three types of measure values

Research on Automatic Chinese Multi-word Term Extraction Based on Integration of Web Information and Term Component

Combination Methods of Chinese Character and Word Embeddings in Deep Learning

Joint Learning of Character and Word Embeddings.

Association Measures for Collocation Extraction

Chinese Word Segmentation without Using Dictionary Based on Unsupervised Learning Strategy