The FinBen: an Holistic Financial Benchmark for Large Language Models
Qianqian Xie,Weiguang Han,Zhengyu Chen,Ruoyu Xiang,Xiao Zhang,Yueru He,Mengxi Xiao,Dong Li,Yongfu Dai,Duanyu Feng,Yijing Xu,Haoqiang Kang,Ziyan Kuang,Chenhan Yuan,Kailai Yang,Zheheng Luo,Tianlin Zhang,Zhiwei Liu,Guojun Xiong,Zhiyang Deng,Yuechen Jiang,Zhiyuan Yao,Haohang Li,Yangyang Yu,Gang Hu,Jiajia Huang,Xiao-Yang Liu,Alejandro Lopez-Lira,Benyou Wang,Yanzhao Lai,Hao Wang,Min Peng,Sophia Ananiadou,Jimin Huang
DOI: https://doi.org/10.48550/arxiv.2402.12659
2024-01-01
Abstract:LLMs have transformed NLP and shown promise in various fields, yet theirpotential in finance is underexplored due to a lack of comprehensive evaluationbenchmarks, the rapid development of LLMs, and the complexity of financialtasks. In this paper, we introduce FinBen, the first extensive open-sourceevaluation benchmark, including 36 datasets spanning 24 financial tasks,covering seven critical aspects: information extraction (IE), textual analysis,question answering (QA), text generation, risk management, forecasting, anddecision-making. FinBen offers several key innovations: a broader range oftasks and datasets, the first evaluation of stock trading, novel agent andRetrieval-Augmented Generation (RAG) evaluation, and three novel open-sourceevaluation datasets for text summarization, question answering, and stocktrading. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT,and the latest Gemini, reveals several key findings: While LLMs excel in IE andtextual analysis, they struggle with advanced reasoning and complex tasks liketext generation and forecasting. GPT-4 excels in IE and stock trading, whileGemini is better at text generation and forecasting. Instruction-tuned LLMsimprove textual analysis but offer limited benefits for complex tasks such asQA. FinBen has been used to host the first financial LLMs shared task at theFinNLP-AgentScen workshop during IJCAI-2024, attracting 12 teams. Their novelsolutions outperformed GPT-4, showcasing FinBen's potential to drive innovationin financial LLMs. All datasets, results, and codes are released for theresearch community: https://github.com/The-FinAI/PIXIU.