Text classification on heterogeneous information network via enhanced GCN and knowledge
Hui Li,Yan Yan,Shuo Wang,Juan Liu,Yunpeng Cui
DOI: https://doi.org/10.1007/s00521-023-08494-0
2023-03-30
Neural Computing and Applications
Abstract:Graph convolutional networks-based text classification methods have shown impressive success in further improving the classification results by considering the structural relationship between words and texts. However, existing GCN-based text classification methods tend to ignore the semantic representation of the node and the global structural information among nodes. Besides, only the word granularity information within the text, i.e., endogenous source, is used to represent the text. Furthermore, the existing graph convolutional network approaches are faced with major challenges to handle large and dense graphs, i.e., neighbor explosion and noisy inputs. To address these shortcomings, this paper proposes an inductive learning-based text classification method that utilizes representation learning on heterogeneous information networks and exogenous knowledge. Firstly, a weighted heterogeneous information network for text (HINT) is constructed by introducing exogenous knowledge, in which the node types cover text, entities and words. The unstructured text is represented as a structured heterogeneous information network, which expands the granularity of text features and makes full use of the exogenous structural information and explicit semantic information to enhance the interpretability of text information. Besides, we also enhanced the graph neural network against the challenges of neighbor explosion and noisy inputs derived from HINT using two strategies: graph sampling and Dropedge, for semi-supervised learning with improved classification performance. The effectiveness of our model is demonstrated by examining four publicly available text classification datasets. Based on experimental results, our approach achieves state-of-the-art performance on the text classification datasets.
computer science, artificial intelligence