BloombergGPT: A Large Language Model for Finance

Shijie Wu,Ozan Irsoy,Steven Lu,Vadim Dabravolski,Mark Dredze,Sebastian Gehrmann,Prabhanjan Kambadur,David Rosenberg,Gideon Mann
2023-12-21
Abstract:The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.
Machine Learning,Artificial Intelligence,Computation and Language,General Finance
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? The main goal of this paper is to develop a large language model (LLM) specifically for the financial sector to address natural language processing tasks in financial technology (FinTech). Specifically: 1. **Specialized Needs in the Financial Sector**: - Although there are many general-purpose large language models (such as GPT-3), these models perform poorly on specific tasks in the financial sector. Therefore, the paper aims to build a language model tailored for the financial sector to better handle finance-related natural language tasks. 2. **Hybrid Data Training Method**: - The paper proposes a new hybrid data training method, which combines specialized financial data with general data. This method not only improves the model's performance on financial tasks but also maintains its competitiveness on general tasks. 3. **Large-Scale Financial Dataset**: - The paper constructs a financial dataset (FinPile) containing 363 billion tokens, which is one of the largest datasets in the financial sector to date. By leveraging Bloomberg's years of accumulated data resources, the quality and diversity of the data are ensured. 4. **Evaluation Methods**: - The paper details the model's performance on multiple standard benchmarks, open financial benchmarks, and internal benchmarks, demonstrating the model's significant advantages in financial tasks while also maintaining competitiveness in general tasks. In summary, this paper aims to improve the performance of finance-related natural language processing tasks by developing a large language model specifically for the financial sector and validating the effectiveness of the hybrid data training method.