Convolutional Long Short-term Memory for Long Length Document Classification

Tian-jing JIANG,Xin HE,Jun HE,Jiao FENG,Peng LI
DOI: https://doi.org/10.3969/j.issn.1000-1220.2019.11.012
2019-01-01
Abstract:The Internet has become an important platform for disseminating information. It is necessary to quickly extract desired infor-mation from substantial documents based on the keywords. This method requires clear classification and labelling of papers. Traditional document classification methods analyze texts by extracting keywords or key sentences,especially for scientific papers in similar direc-tions. Based on the partial information,the classification is not clear enough to cause confusion. In this paper,we propose a method for analyzing the long documents and for automatically generating their labels in terms of the global information. In order to reduce the depth of the convolutional neural network(CNN)and to capture the global information,the proposed classification method firstly splits the document into multiple parts by random sampling algorithm. Then,the local features of each part are extracted by the CNN,which is followed by the Long Short-Term Memory(LSTM)for the sake of associating these local features. The simulation results show that compared with the local-information-based classification methods,the proposed classification method is capable of improving accuracy for classifying the long-length literature in similar directions and of improving the training speed.
What problem does this paper attempt to address?