A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models

Syed Affan Daimi,Asma Iqbal
2024-09-26
Abstract:The number of companies listed on the NYSE has been growing exponentially, creating a significant challenge for market analysts, traders, and stockholders who must monitor and assess the performance and strategic shifts of a large number of companies regularly. There is an increasing need for a fast, cost-effective, and comprehensive method to evaluate the performance and detect and compare many companies' strategy changes efficiently. We propose a novel data-driven approach that leverages large language models (LLMs) to systematically analyze and rate the performance of companies based on their SEC 10-K filings. These filings, which provide detailed annual reports on a company's financial performance and strategic direction, serve as a rich source of data for evaluating various aspects of corporate health, including confidence, environmental sustainability, innovation, and workforce management. We also introduce an automated system for extracting and preprocessing 10-K filings. This system accurately identifies and segments the required sections as outlined by the SEC, while also isolating key textual content that contains critical information about the company. This curated data is then fed into Cohere's Command-R+ LLM to generate quantitative ratings across various performance metrics. These ratings are subsequently processed and visualized to provide actionable insights. The proposed scheme is then implemented on an interactive GUI as a no-code solution for running the data pipeline and creating the visualizations. The application showcases the rating results and provides year-on-year comparisons of company performance.
Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenges faced by market analysts, traders, and shareholders in monitoring and evaluating the performance and strategic changes of a large number of listed companies. With the exponential growth in the number of companies listed on the New York Stock Exchange (NYSE), how to quickly, cost - effectively, and comprehensively evaluate the performance of these companies and detect and compare their strategic changes has become a significant issue. Although traditional methods can provide valuable insights, they are usually mainly narrative - based analyses and it is difficult to quickly obtain and compare the performance of multiple companies. For this reason, the paper proposes a new data - driven method that uses large - language models (LLMs) to systematically analyze and rate companies' SEC 10 - K filings. The 10 - K filings provide detailed annual reports on a company's financial performance and strategic direction and are a rich source of data for evaluating all aspects of a company's health, such as confidence, environmental sustainability, innovation, and workforce management. This method extracts and pre - processes 10 - K filings through an automated system, accurately identifies and segments the necessary parts as required by the SEC, and at the same time separates the text content containing key information about the company. These collated data are input into Cohere's Command - R+ LLM to generate quantitative scores for various performance indicators. These scores are then processed and visualized to provide actionable insights. In addition, the paper also develops an interactive graphical user interface (GUI) as a no - code solution to run the data pipeline and create visualized results, display the scoring results, and provide an annual comparison of company performance. This method not only improves the speed and accuracy of the analysis but also reduces the cost, making large - scale evaluation possible. Through this method, stakeholders can quickly evaluate and compare the performance of multiple companies on a wide range of criteria.