Abstract:Many institutions are devoted to providing investment advising services to stock investors to help them make sound investment decisions. Industry analysts at these institutions need to analyze huge amounts of financial news documents, and yield investment advising reports to the service subscribers. Automatic document classification is required to organize collected financial news documents into pre-defined fine-grained categories, before the document analysis tasks. It is challenging to implement accurate fine-grained classification over massive financial documents, because documents from close fine-grained categories are highly semantically similar, while existing classification methods may fail to differentiate the subtle differences for documents from close fine-grained categories. In this paper, we implement a document classification framework, named GraphSEAT, to classify financial documents for a leading financial information service provider in China. Specifically, we build a heterogeneous graph to model the global structure of our targeting financial documents, where documents and financial named entities are deemed as nodes, and a document is connected to a contained named entity with an edge, and we then train a graph convolutional network (GCN) with attention mechanisms, to learn an embedding representation containing domain information for a document. We also extract semantic information from a document's word sequence with a neural sequence encoder, and finally form an overall embedding representation for a document and make the prediction, via fusing the two learned representations of the document with attention mechanisms. We perform extensive experiments on our real-world financial news dataset and three public datasets, to evaluate the performance of the document classification framework, and the experimental results demonstrate that GraphSEAT outperforms all compared eight baseline models, especially on our dataset.

FinEmbedDiff: A Cost-Effective Approach of Classifying Financial Documents with Vector Sampling using Multi-modal Embedding Models

FETILDA: An Effective Framework For Fin-tuned Embeddings For Long Financial Text Documents

LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents

Improving Document Classification with Multi-Sense Embeddings

FinDiff: Diffusion Models for Financial Tabular Data Generation

Fusing Global Domain Information and Local Semantic Information to Classify Financial Documents

Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections

Attentive statement fraud detection: Distinguishing multimodal financial data with fine-grained attention

Contrastive Learning of Asset Embeddings from Financial Time Series

Leveraging Domain Information to Classify Financial Documents via Unsupervised Graph Momentum Contrast

FETILDA: An Evaluation Framework for Effective Representations of Long Financial Documents

A Multimodal Embedding-Based Approach to Industry Classification in Financial Markets

IITK at the FinSim Task: Hypernym Detection in Financial Domain via Context-Free and Contextualized Word Embeddings

SSCDV: Social media document embedding with sentiment and topics for financial market forecasting

Summarizing Charts of Financial Document Via Context-Aware Multi-Modeling

Financial Fraud Detection Approach Based on Firefly Optimization Algorithm and Support Vector Machine

Greenback Bears and Fiscal Hawks: Finance is a Jungle and Text Embeddings Must Adapt

A Financial Embedded Vector Model and Its Applications to Time Series Forecasting

Financial Fraud Detection using Deep Support Vector Data Description

GuideWalk -- Heterogeneous Data Fusion for Enhanced Learning -- A Multiclass Document Classification Case

Predicting financial distress using multimodal data: An attentive and regularized deep learning method