JIANG: Chinese Open Foundation Language Model

Qinhua Duan,Wenchao Gu,Yujia Chen,Wenxin Mao,Zewen Tian,Hui Cao
2023-08-01
Abstract:With the advancements in large language model technology, it has showcased capabilities that come close to those of human beings across various tasks. This achievement has garnered significant interest from companies and scientific research institutions, leading to substantial investments in the research and development of these models. While numerous large models have emerged during this period, the majority of them have been trained primarily on English data. Although they exhibit decent performance in other languages, such as Chinese, their potential remains limited due to factors like vocabulary design and training corpus. Consequently, their ability to fully express their capabilities in Chinese falls short. To address this issue, we introduce the model named JIANG (Chinese pinyin of ginger) specifically designed for the Chinese language. We have gathered a substantial amount of Chinese corpus to train the model and have also optimized its structure. The extensive experimental results demonstrate the excellent performance of our model.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The main goal of this paper is to introduce a large-scale language model specifically designed for the Chinese language environment—JIANG. Given that most current large-scale language models are primarily trained on English corpora and thus perform suboptimally on Chinese tasks, the authors aim to address this limitation by constructing a large-scale language model focused on Chinese. Specifically, the paper addresses the following key issues: 1. **Language model optimized for Chinese**: Most large language models on the market are primarily trained using English datasets, which leads to poor performance when handling Chinese tasks. Therefore, it is particularly necessary to develop a model optimized specifically for Chinese. 2. **High-quality Chinese corpus**: To train a high-performance Chinese model, the authors collected a large amount of Chinese corpora, including internet texts, Wikipedia, financial data, etc., and conducted strict quality control on these data. 3. **Optimization of model structure and training techniques**: JIANG adopts a network design based on the Transformer architecture and introduces a series of innovations on this basis, such as partially removing bias terms in fully connected layers, using RMSNorm layers, and introducing gating mechanisms, to improve the model's performance and generalization ability. 4. **Experimental validation**: The paper also provides detailed experimental results, demonstrating JIANG's superior performance on multiple Chinese natural language processing tasks, especially in inference tasks, where it shows a significant advantage over other models. In summary, this research is dedicated to enhancing the capability and level of Chinese natural language processing by developing a high-quality language model specifically designed for Chinese.