Abstract:Many institutions are devoted to providing investment advising services to stock investors to help them make sound investment decisions. Industry analysts at these institutions need to analyze huge amounts of financial news documents, and yield investment advising reports to the service subscribers. Automatic document classification is required to organize collected financial news documents into pre-defined fine-grained categories, before the document analysis tasks. It is challenging to implement accurate fine-grained classification over massive financial documents, because documents from close fine-grained categories are highly semantically similar, while existing classification methods may fail to differentiate the subtle differences for documents from close fine-grained categories. In this paper, we implement a document classification framework, named GraphSEAT, to classify financial documents for a leading financial information service provider in China. Specifically, we build a heterogeneous graph to model the global structure of our targeting financial documents, where documents and financial named entities are deemed as nodes, and a document is connected to a contained named entity with an edge, and we then train a graph convolutional network (GCN) with attention mechanisms, to learn an embedding representation containing domain information for a document. We also extract semantic information from a document's word sequence with a neural sequence encoder, and finally form an overall embedding representation for a document and make the prediction, via fusing the two learned representations of the document with attention mechanisms. We perform extensive experiments on our real-world financial news dataset and three public datasets, to evaluate the performance of the document classification framework, and the experimental results demonstrate that GraphSEAT outperforms all compared eight baseline models, especially on our dataset.

Weight attention layer‐based document classification incorporating information gain

Bi-LSTM Model to Increase Accuracy in Text Classification: Combining Word2vec CNN and Attention Mechanism

Convolutional Long Short-term Memory for Long Length Document Classification

IMPROVING MULTI-LABEL TEXT CLASSIFICATION USING WEIGHTED INFORMATION GAIN AND CO-TRAINED MULTINOMIAL NAÏVE BAYES CLASSIFIER

Hierarchical Attentional Hybrid Neural Networks for Document Classification

Performance Analysis of Hybrid Deep Learning Models with Attention Mechanism Positioning and Focal Loss for Text Classification

Key-Guided Identity Document Classification Method by Graph Attention Network

Label-Attentive Hierarchical Attention Network for Text Classification

Long Document Classification From Local Word Glimpses via Recurrent Attention Learning.

Fusing Global Domain Information and Local Semantic Information to Classify Financial Documents

CWC: A Clustering-Based Feature Weighting Approach for Text Classification

Text classification based on ensemble extreme learning machine

Optimizing News Text Classification with Bi-LSTM and Attention Mechanism for Efficient Data Processing

Hierarchical-Document-Structure-Aware Attention with Adaptive Cost Sensitive Learning for Biomedical Document Classification

An Improved Double Channel Long Short-Term Memory Model for Medical Text Classification

Misclassification-guided loss under the weighted cross-entropy loss framework

HGATT_LR: transforming review text classification with hypergraphs attention layer and logistic regression

Attention Mechanisms in Clinical Text Classification: A Comparative Evaluation

Improved text classification methods based on weighted adjustments

Bidirectional LSTM with attention mechanism and convolutional layer for text classification

Improving Text Classification Using Local Latent Semantic Indexing