Hierarchical Multi-granularity Interaction Graph Convolutional Network for Long Document Classification
Tengfei Liu,Yongli Hu,Junbin Gao,Yanfeng Sun,Baocai Yin
DOI: https://doi.org/10.1109/taslp.2024.3369530
2024-01-01
Abstract:With the growing demand for text analytics, long document classification (LDC) has received extensive attention, and great progress has been made. To reveal the complex structure and extract the intrinsic feature, the current approaches focus on modeling a long sequence with sparse attention or representing word-sentence or word-section relations partially. However, the thorough hierarchical structure from words, sentences to sections of long documents remains relatively unexplored. For this purpose, we propose a novel Hierarchical Multi-granularity Interaction Graph Convolutional Network (HMIGCN) for long document classification, in which three different granularity graphs, i.e., section graph, sentence graph and word graph, are constructed hierarchically. The section graph encapsulates the macrostructure of a long document, while the sentence and word graphs delve into the document's microstructure. Notably, within the sentence graph, we introduce a Global-Local Graph Convolutional (GLGC) block to adaptively capture both global and local dependency structures among sentence nodes. Additionally, to integrate the three graph networks as a whole, two well-designed techniques, namely section-guided pooling block and transfer fusion block, are proposed to train the model jointly by promoting each other. Extensive experiments on five long document datasets show that our model outperforms the existing state-of-the-art LDC models.
engineering, electrical & electronic,acoustics