XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters

Xuanyu Zhang,Qing Yang,Dongliang Xu
2023-05-20
Abstract:In recent years, pre-trained language models have undergone rapid development with the emergence of large-scale models. However, there is a lack of open-sourced chat models specifically designed for the Chinese language, especially in the field of Chinese finance, at the scale of hundreds of billions. To address this gap, we introduce XuanYuan 2.0, the largest Chinese chat model to date, built upon the BLOOM-176B architecture. Additionally, we propose a novel training method called hybrid-tuning to mitigate catastrophic forgetting. By combining general-domain with domain-specific knowledge and integrating the stages of pre-training and fine-tuning, XuanYuan 2.0 is capable of providing accurate and contextually appropriate responses in the Chinese financial domain.
Computation and Language
What problem does this paper attempt to address?
The main objectives of this paper are to address the following issues: 1. **Filling the Market Gap**: The current market lacks open-source chat models with hundreds of billions of parameters specifically for the Chinese language, especially in the Chinese financial sector. Although some existing models (such as FinBERT, Mengzi, FinT5, etc.) perform well in financial text analysis, their parameter scales have not reached the billion-level, limiting their ability to handle increasingly complex Chinese financial data. 2. **Proposing a New Training Method**: To address the "catastrophic forgetting" problem that domain-specific models often encounter during training, the paper proposes a new training framework—hybrid-tuning. By integrating the pre-training and fine-tuning stages, this method ensures that the model can master both general domain knowledge and specialized financial knowledge, thereby providing accurate and contextually relevant responses in Chinese financial dialogues. In summary, the paper aims to meet the demand for high-performance language models in the Chinese financial market and address the challenges in training domain-specific models by developing the large-scale Chinese financial chat model XuanYuan 2.0 and its innovative training method.