FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models

Shu Liu,Shangqing Zhao,Chenghao Jia,Xinlin Zhuang,Zhaoguang Long,Jie Zhou,Aimin Zhou,Man Lan,Qingquan Wu,Chong Yang

2024-06-14

Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks. However, their proficiency and reliability in the specialized domain of financial data analysis, particularly focusing on data-driven thinking, remain uncertain. To bridge this gap, we introduce \texttt{FinDABench}, a comprehensive benchmark designed to evaluate the financial data analysis capabilities of LLMs within this context. \texttt{FinDABench} assesses LLMs across three dimensions: 1) \textbf{Foundational Ability}, evaluating the models' ability to perform financial numerical calculation and corporate sentiment risk assessment; 2) \textbf{Reasoning Ability}, determining the models' ability to quickly comprehend textual information and analyze abnormal financial reports; and 3) \textbf{Technical Skill}, examining the models' use of technical knowledge to address real-world data analysis challenges involving analysis generation and charts visualization from multiple perspectives. We will release \texttt{FinDABench}, and the evaluation scripts at \url{<a class="link-external link-https" href="https://github.com/cubenlp/BIBench" rel="external noopener nofollow">this https URL</a>}. \texttt{FinDABench} aims to provide a measure for in-depth analysis of LLM abilities and foster the advancement of LLMs in the field of financial data analysis.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

This paper proposes a new benchmark called FinDABench to evaluate the performance of large language models (LLMs) in financial data analysis. Currently, although LLMs have demonstrated a wide range of capabilities in various tasks, their performance in professional domains, especially in data-driven financial analysis skills, has not been thoroughly investigated. FinDABench tests the models through a three-level framework: foundational abilities (such as numerical calculations and corporate risk assessment), inferential abilities (such as understanding and analyzing anomalous financial reports), and technical skills (such as utilizing technical knowledge for multi-dimensional analysis and visualization). This benchmark includes six subtasks, covering classification, extraction, and generation tasks to comprehensively assess the models' skills in financial data analysis. What sets FinDABench apart is its focus on practical financial scenarios, requiring the models to integrate information, pose relevant questions, and apply advanced techniques for in-depth data analysis and interpretation, rather than just answering questions. The paper also systematically benchmarks the financial data analysis capabilities of 41 popular LLMs for the first time and identifies the current limitations of existing methods. Through FinDABench, researchers hope to drive advancements in LLMs in the field of financial data analysis and bridge the gap between general-purpose models and specific domain requirements.

FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models

CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models

The FinBen: an Holistic Financial Benchmark for Large Language Models

FinBen: A Holistic Financial Benchmark for Large Language Models

Data-Centric Financial Large Language Models

Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models

A Data-Centric Approach for Financial Large Language Models with Abductive Augmentation Reasoning

FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models

CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance

SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models

INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Financial Knowledge Large Language Model

FinLLMs: A Framework for Financial Reasoning Dataset Generation with Large Language Models

DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning

Evaluating Large Language Models on Financial Report Summarization: An Empirical Study

Baichuan4-Finance Technical Report

Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications