Abstract:Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to better align with the career trajectory of Chinese financial practitioners, we build a systematic evaluation from 4 first-level categories: (1) Financial Subject: whether LLMs can memorize the necessary basic knowledge of financial subjects, such as economics, statistics and auditing. (2) Financial Qualification: whether LLMs can obtain the needed financial qualified certifications, such as certified public accountant, securities qualification and banking qualification. (3) Financial Practice: whether LLMs can fulfill the practical financial jobs, such as tax consultant, junior accountant and securities analyst. (4) Financial Law: whether LLMs can meet the requirement of financial laws and regulations, such as tax law, insurance law and economic law. CFinBench comprises 99,100 questions spanning 43 second-level categories with 3 question types: single-choice, multiple-choice and judgment. We conduct extensive experiments of 50 representative LLMs with various model size on CFinBench. The results show that GPT4 and some Chinese-oriented models lead the benchmark, with the highest average accuracy being 60.16%, highlighting the challenge presented by CFinBench. The dataset and evaluation code are available at <a class="link-external link-https" href="https://cfinbench.github.io/" rel="external noopener nofollow">this https URL</a>.

Is ChatGPT a Financial Expert? Evaluating Language Models on Financial Natural Language Processing

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks

Revolutionizing Finance with LLMs: An Overview of Applications and Insights

How Much Does ChatGPT Know about Finance?

FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models

Evaluating Large Language Models on Financial Report Summarization: An Empirical Study

Large Language Models and Generative AI in Finance: An Analysis of ChatGPT, Bard, and Bing AI

FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis

A Survey of Large Language Models in Finance (FinLLMs)

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

FinGPT: Open-Source Financial Large Language Models

Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset

BloombergGPT: A Large Language Model for Finance

Financial Statement Analysis with Large Language Models

Data-centric financial large language models

CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models