Towards reducing hallucination in extracting information from financial reports using Large Language Models

Bhaskarjit Sarmah,Tianjie Zhu,Dhagash Mehta,Stefano Pasquali
2023-10-17
Abstract:For a financial analyst, the question and answer (Q\&A) segment of the company financial report is a crucial piece of information for various analysis and investment decisions. However, extracting valuable insights from the Q\&A section has posed considerable challenges as the conventional methods such as detailed reading and note-taking lack scalability and are susceptible to human errors, and Optical Character Recognition (OCR) and similar techniques encounter difficulties in accurately processing unstructured transcript text, often missing subtle linguistic nuances that drive investor decisions. Here, we demonstrate the utilization of Large Language Models (LLMs) to efficiently and rapidly extract information from earnings report transcripts while ensuring high accuracy transforming the extraction process as well as reducing hallucination by combining retrieval-augmented generation technique as well as metadata. We evaluate the outcomes of various LLMs with and without using our proposed approach based on various objective metrics for evaluating Q\&A systems, and empirically demonstrate superiority of our method.
Computation and Language,Portfolio Management,Statistical Finance,Applications
What problem does this paper attempt to address?
This paper aims to address the hallucination problem that arises when extracting information from the Q&A sections of financial reports. Specifically, the paper explores how to efficiently and accurately extract valuable information from earnings call transcripts using large language models (LLMs), and how to reduce hallucinations by combining retrieval-augmented generation techniques and metadata. The researchers evaluated the performance of different pre-trained LLMs with and without their proposed methods, and assessed the Q&A system based on various objective metrics, empirically demonstrating the effectiveness of the approach. Additionally, the paper discusses how metadata assistance can improve accuracy in multi-document processing scenarios, thereby enhancing the reliability and precision of information extraction.