A Dutch Financial Large Language Model

Sander Noels,Jorne De Blaere,Tijl De Bie
DOI: https://doi.org/10.1145/3677052.3698628
2024-10-03
Abstract:This paper presents FinGEITje, the first Dutch financial Large Language Model (LLM) specifically designed and optimized for various financial tasks. Together with the model, we release a specialized Dutch financial instruction tuning dataset with over 140,000 samples, constructed employing an automated translation and data processing method. The open-source data construction method is provided, facilitating the creation of financial instruction datasets in different languages. To evaluate model performance, the study introduces the first Dutch financial evaluation benchmark, along with an automated evaluation method that utilizes an LLM as an independent evaluator, reducing manual intervention in performance evaluation. The experimental results highlight the superior performance of FinGEITje across five critical Dutch and English financial tasks.
Computational Engineering, Finance, and Science,Computation and Language
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the deficiencies of large - language models (LLMs) in the Dutch financial field. Specifically, the author points out that most current financial LLMs mainly focus on English and overlook non - English languages such as Dutch. This has led to the following problems: 1. **Lack of professional models for Dutch financial documents**: Existing general - purpose LLMs perform poorly when processing Dutch financial texts, unable to accurately understand complex financial terms and concepts, resulting in inefficiency and inaccuracy. 2. **Scarcity of data and high annotation costs**: High - quality annotated data in the financial field is scarce and the annotation cost is high, which limits the development of models specifically for Dutch financial tasks. 3. **Lack of evaluation benchmarks**: Currently, there are no evaluation benchmarks specifically for Dutch financial tasks, making it difficult to systematically evaluate model performance. To solve these problems, the paper makes the following key contributions: - **Introducing FinGEITje**: This is the first open - source, transparent, and easily accessible Dutch - language financial LLM. - **Providing a specialized dataset**: A Dutch - language financial instruction - tuning dataset containing more than 140,000 samples has been constructed, and an open - source code data construction method has been provided to support the development of multilingual financial LLMs. - **Releasing an evaluation benchmark**: The first Dutch - language financial evaluation benchmark has been introduced, and an automated evaluation method has been proposed, using LLMs as independent evaluators to reduce human intervention. Through these contributions, FinGEITje not only fills the gap in Dutch - language financial LLMs but also lays a solid foundation for future research and applications, promoting the democratization of financial data analysis and technological progress. ### Example of formula presentation To ensure the correctness and readability of formulas, the following is an example of formula presentation in Markdown format (although this article does not involve complex formulas, it is for reference): ```markdown Suppose we have a function \(f(x)\), and its derivative is: $$ f^{\prime}(x)=\lim_{h\rightarrow0}\frac{f(x + h)-f(x)}{h} $$ This formula describes the rate of change of the function at a certain point. ``` Hope this information can help you better understand the core problems of the paper and their solutions. If you have more questions or need further explanation, please feel free to let us know!