Abstract:Chinese characters are often composed of subcharacter components which are also semantically informative, and the component-level internal semantic features of a Chinese character inherently bring with additional information that benefits the semantic representation of the character. Therefore, there have been several studies that utilized subcharacter component information (e.g. radical, fine-grained components and stroke n-grams) to improve Chinese character representation. However we argue that it has not been fully explored what would be the best way of modeling and encoding a Chinese character. For improving the representation of a Chinese character, existing methods introduce more component-level internal semantic features as well as more semantic irrelevant subcharacter component information, and these semantic irrelevant subcharacter component will be noisy for representing a Chinese character. Moreover, existing methods suffer from the inability of discriminating the importance of the introduced subcharacter components, accordingly they can not filter out introduced noisy subcharacter component information. In this paper, we first decompose Chinese characters into components according to their formations, then model a Chinese character and its decomposed components as a graph structure named Chinese character formation graph; Chinese character formation graph can reserve the azimuth relationship among subcharacter components, and be advantageous to explicitly model the component-level internal semantic features of a Chinese character. Furtherly, we propose a novel model Chinese Character Formation Graph Attention Network (FGAT) which is able to discriminate the importance of the introduced subcharacter components and extract component-level internal semantic features of a Chinese character efficiently. To demonstrate the effectiveness of our research, we have conducted extensive experiments. The experimental results show that our model achieves better results than state-of-the-art (SOTA) approaches.

Word-Character Graph Convolution Network for Chinese Named Entity Recognition.

A Lexicon-Based Graph Neural Network for Chinese NER

A Character-Word Graph Attention Networks for Chinese Text Classification

A Local Information Perception Enhancement–Based Method for Chinese NER

Lattice LSTM for Chinese Sentence Representation

Characters as Graphs: Recognizing Online Handwritten Chinese Characters via Spatial Graph Convolutional Network

DTGCN: a method combining dependency tree and graph convolutional networks for Chinese long-interval named entity relationship extraction

Glyph-aware Embedding of Chinese Characters

The interactive fusion of characters and lexical information for Chinese named entity recognition

A Deep Convolutional Neural Model for Character-Based Chinese Word Segmentation

VCWE: Visual Character-Enhanced Word Embeddings

Empirical Study on Character Level Neural Network Classifier for Chinese Text.

FGN: Fusion Glyph Network for Chinese Named Entity Recognition

Chinese text classification by combining Chinese-BERTology-wwm and GCN

CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition

Unified Lattice Graph Fusion for Chinese Named Entity Recognition

Multiple Character Embeddings for Chinese Word Segmentation

CNN-Based Chinese NER with Lexicon Rethinking

Word-character attention model for Chinese text classification

DAG-based Long Short-Term Memory for Neural Word Segmentation

Improving Chinese Character Representation with Formation Graph Attention Network