Abstract:In the financial sphere, there is a wealth of accumulated unstructured financial data, such as the textual disclosure documents that companies submit on a regular basis to regulatory agencies, such as the Securities and Exchange Commission (SEC). These documents are typically very long and tend to contain valuable soft information about a company’s performance that is not present in quantitative predictors. It is therefore of great interest to learn predictive models from these long textual documents, especially for forecasting numerical key performance indicators (KPIs). In recent years, there has been a great progress in natural language processing via pre-trained language models (LMs) learned from large corpora of textual data. This prompts the important question of whether they can be used effectively to produce representations for long documents, as well as how we can evaluate the effectiveness of representations produced by various LMs. Our work focuses on answering this critical question, namely the evaluation of the efficacy of various LMs in extracting useful soft information from long textual documents for prediction tasks. In this paper, we propose and implement a deep learning evaluation framework that utilizes a sequential chunking approach combined with an attention mechanism. We perform an extensive set of experiments on a collection of 10-K reports submitted annually by US banks, and another dataset of reports submitted by US companies, in order to investigate thoroughly the performance of different types of language models. Overall, our framework using LMs outperforms strong baseline methods for textual modeling as well as for numerical regression. Our work provides better insights into how utilizing pre-trained domain-specific and fine-tuned long-input LMs for representing long documents can improve the quality of representation of textual data, and therefore, help in improving predictive analyses.

Text analysis in financial disclosures

Financial Text Mining in Twitterland

Textual analysis and machine leaning: Crack unstructured data in finance and accounting ☆

Predict financial text sentiment: an empirical examination

Bankruptcy prediction using disclosure text features

Textual Analysis in Accounting and Finance: A Survey

In Search of Meaning: Lessons, Resources and Next Steps for Computational Analysis of Financial Discourse

Text‐based sentiment analysis in finance: Synthesising the existing literature and exploring future directions

Extracting Financial Data From Unstructured Sources: Leveraging Large Language Models

FETILDA: An Effective Framework For Fin-tuned Embeddings For Long Financial Text Documents

Comprehensive review of text-mining applications in finance

Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures

Why do banks fail? An investigation via text mining

Analyzing Financial Fraud Cases Using a Linguistics-Based Text Mining Approach

A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents

Using sentiment analysis to study the relationship between subjective expression in financial reports and company performance

Financial data analysis application via multi-strategy text processing

Textual sentiment in finance: A survey of methods and models

FETILDA: An Evaluation Framework for Effective Representations of Long Financial Documents

From Text Representation to Financial Market Prediction: A Literature Review

Sentiment Analysis for Financial Markets