A multi-semantic passing framework for semi-supervised long text classification
Wei Ai,Ze Wang,Hongen Shao,Tao Meng,Keqin Li
DOI: https://doi.org/10.1007/s10489-023-04556-x
IF: 5.3
2023-03-31
Applied Intelligence
Abstract:As an important task of natural language processing (NLP), text classification has flourished with the rise of deep learning techniques. However, existing deep learning methods face challenges as the length of input text increases. Many long text classification works are classified by text truncation or simply extracting keywords, which leads to the loss of rich semantic and structural information. Furthermore, there are great demands for studying semi-supervised long text classification due to the lack of labeled training data and continuously generated long texts in different stylistic. To alleviate these problems, we propose a heterogeneous attention network method based on a multi-semantic passing framework. In particular, we develop a flexible heterogeneous information graph to model the long texts by extracting information, including keywords, entities, titles, and their multi-interrelation. It can effectively integrate the semantic relationship and condense the global information to preserve the significant semantic and structural information well. Furthermore, we design a multi-semantic passing framework capable of extracting the semantic and structural information in the constructed heterogeneous information graph by the semantic degree of specific structures. Experimental works on four real-world datasets are studied, such as ThuCNews, SougouNews, 20NG, and Ohsumed, yielded outstanding results. It is shown an accuracy rate of 98.13%, 98.69%, 87.62%, and 71.46%, respectively, which performs better than the existing methods.
computer science, artificial intelligence