GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models

Udit Gupta
2023-09-07
Abstract:Annual Reports of publicly listed companies contain vital information about their financial health which can help assess the potential impact on Stock price of the firm. These reports are comprehensive in nature, going up to, and sometimes exceeding, 100 pages. Analysing these reports is cumbersome even for a single firm, let alone the whole universe of firms that exist. Over the years, financial experts have become proficient in extracting valuable information from these documents relatively quickly. However, this requires years of practice and experience. This paper aims to simplify the process of assessing Annual Reports of all the firms by leveraging the capabilities of Large Language Models (LLMs). The insights generated by the LLM are compiled in a Quant styled dataset and augmented by historical stock price data. A Machine Learning model is then trained with LLM outputs as features. The walkforward test results show promising outperformance wrt S&P500 returns. This paper intends to provide a framework for future work in this direction. To facilitate this, the code has been released as open source.
Statistical Finance,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of how to improve stock investment strategies by analyzing annual reports of publicly traded companies using large language models (LLMs). Specifically, the goals of the paper include: 1. **Simplifying the analysis process of annual reports**: Annual reports typically contain a wealth of important information about a company's financial health, but these reports are lengthy and complex, making it time-consuming to analyze even a single company. The paper proposes a method that uses large language models (such as GPT-3.5) to automatically extract and analyze key information from these reports. 2. **Generating valuable features**: Insights generated by large language models are compiled into a quantitative-style dataset and combined with historical stock price data to train machine learning models. These features are designed to capture key information about the company's financial and managerial aspects. 3. **Predicting stock performance**: The paper constructs a machine learning model that uses features generated by large language models as input to predict the best-performing stocks over the next year. Backtesting shows that this model performs well in stock selection, outperforming the S&P 500 index. 4. **Providing a framework for future research**: The paper not only demonstrates the effectiveness of the current method but also provides open-source code so that other researchers can further explore and improve in this direction. In summary, the paper aims to leverage large language models and machine learning techniques to extract valuable information from complex annual reports to enhance the performance of stock investment strategies.