Abstract:Text classification is an important and classic application in natural language processing (NLP). Recent studies have shown that graph neural networks (GNNs) are effective in tasks with rich structural relationships and serve as effective transductive learning approaches. Text representation learning methods based on large-scale pretraining can learn implicit but rich semantic information from text. However, few studies have comprehensively utilized the contextual semantic and structural information for Chinese text classification. Moreover, the existing GNN methods for text classification did not consider the applicability of their graph construction methods to long or short texts. In this work, we propose Chinese-BERTology-wwm-GCN, a framework that combines Chinese bidirectional encoder representations from transformers (BERT) series models with whole word masking (Chinese-BERTology-wwm) and the graph convolutional network (GCN) for Chinese text classification. When building text graph, we use documents and words as nodes to construct a heterogeneous graph for the entire corpus. Specifically, we use the term frequency-inverse document frequency (TF-IDF) to construct the word-document edge weights. For long text corpora, we propose an improved pointwise mutual information (PMI*) measure for words according to their word co-occurrence distances to represent the weights of word-word edges. For short text corpora, the co-occurrence information between words is often limited. Therefore, we utilize cosine similarity to represent the word-word edge weights. During the training stage, we effectively combine the cross-entropy and hinge losses and use them to jointly train Chinese-BERTology-wwm and GCN. Experiments show that our proposed framework significantly outperforms the baselines on three Chinese benchmark datasets and achieves good performance even with few labeled training sets.

Self-training Method Based on GCN for Semi-Supervised Short Text Classification

Self-supervised Short Text Classification with Heterogeneous Graph Neural Networks

A Novel Method Using Local Feature to Enhance GCN for Text Classification

Continual Graph Convolutional Network for Text Classification

Every node counts: Self-ensembling graph convolutional networks for semi-supervised learning

Chinese text classification by combining Chinese-BERTology-wwm and GCN

Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification

Semi-supervised Dynamic Counter Propagation Network

Multi-label text classification based on semantic-sensitive graph convolutional network

Self-Taught convolutional neural networks for short text clustering

Self-SAGCN: Self-Supervised Semantic Alignment for Graph Convolution Network

HGAT: Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

Weakly-supervised Text Classification Based on Keyword Graph

Simplified-Boosting Ensemble Convolutional Network for Text Classification

Deeper Insights Into Graph Convolutional Networks for Semi-Supervised Learning

Text classification on heterogeneous information network via enhanced GCN and knowledge

Heterogeneous graph contrastive learning with adaptive data augmentation for semi‐supervised short text classification

Understanding Graph Convolutional Networks for Text Classification

InducT-GCN: Inductive Graph Convolutional Networks for Text Classification

Self-supervised Training of Graph Convolutional Networks

Weakly-Supervised Neural Text Classification