FLAG: Financial Long Document Classification via AMR-based GNN

Bolun "Namir" Xia,Aparna Gupta,Mohammed J. Zaki
2024-10-23
Abstract:The advent of large language models (LLMs) has initiated much research into their various financial applications. However, in applying LLMs on long documents, semantic relations are not explicitly incorporated, and a full or arbitrarily sparse attention operation is employed. In recent years, progress has been made in Abstract Meaning Representation (AMR), which is a graph-based representation of text to preserve its semantic relations. Since AMR can represent semantic relationships at a deeper level, it can be beneficially utilized by graph neural networks (GNNs) for constructing effective document-level graph representations built upon LLM embeddings to predict target metrics in the financial domain. We propose FLAG: Financial Long document classification via AMR-based GNN, an AMR graph based framework to generate document-level embeddings for long financial document classification. We construct document-level graphs from sentence-level AMR graphs, endow them with specialized LLM word embeddings in the financial domain, apply a deep learning mechanism that utilizes a GNN, and examine the efficacy of our AMR-based approach in predicting labeled target data from long financial documents. Extensive experiments are conducted on a dataset of quarterly earnings calls transcripts of companies in various sectors of the economy, as well as on a corpus of more recent earnings calls of companies in the S&P 1500 Composite Index. We find that our AMR-based approach outperforms fine-tuning LLMs directly on text in predicting stock price movement trends at different time horizons in both datasets. Our work also outperforms previous work utilizing document graphs and GNNs for text classification.
Computational Engineering, Finance, and Science,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems in the classification of long - financial documents: 1. **Explicit Incorporation of Semantic Relations**: - Existing large language models (LLMs) fail to explicitly incorporate semantic relations when dealing with long documents. These models usually adopt full - attention or arbitrary sparse - attention mechanisms, which may lead to the loss of semantic information. - To solve this problem, the paper introduces Abstract Meaning Representation (AMR), a graph - based text representation method that can preserve the semantic relations of the text. 2. **Effective Representation of Long Documents**: - Long documents (such as company quarterly earnings call transcripts) usually exceed the maximum context length limit of LLMs, making it difficult to generate effective document - level representations. - The paper proposes a new framework - FLAG (Financial Long Document Classification via AMR - based GNN), which constructs document - level representations through AMR graphs and graph neural networks (GNNs) to better capture the semantic information of long documents. 3. **Improving the Performance of Text Classification in the Financial Field**: - In the financial field, text data (such as earnings call transcripts) are crucial for tasks such as predicting stock price trends. Existing methods perform poorly when dealing with these long documents. - By combining AMR and GNN, the paper significantly improves the performance of trend - label classification in predicting stock price trends, especially in different time ranges. ### Main Contributions - **Proposing the FLAG Framework**: This framework uses AMR to construct document - level graphs from sentence - level graphs and uses GNN to learn effective document - level representations. - **Experimentally Proving Superiority**: Extensive experiments were carried out on two different datasets, and the results show that FLAG outperforms existing methods in predicting stock price trends and achieves state - of - the - art performance (SOTA). ### Method Overview 1. **Sentence - level AMR Parsing**: Parse each sentence into an AMR graph and aggregate these sentence - level graphs into a document - level graph through a hierarchical method. 2. **Document - level Graph Construction**: Use virtual nodes to connect sentence - level graphs to form a document - level graph and initialize node embeddings. 3. **GNN Model Training and Fine - tuning**: Apply GNN models such as GATv2 to generate the final document representation and use it for downstream classification tasks. Through this method, FLAG can not only better capture the semantic relations in long documents but also more accurately predict trend labels in the financial field.