Qwen Technical Report

Jinze Bai,Shuai Bai,Yunfei Chu,Zeyu Cui,Kai Dang,Xiaodong Deng,Yang Fan,Wenbin Ge,Yu Han,Fei Huang,Binyuan Hui,Luo Ji,Mei Li,Junyang Lin,Runji Lin,Dayiheng Liu,Gao Liu,Chengqiang Lu,Keming Lu,Jianxin Ma,Rui Men,Xingzhang Ren,Xuancheng Ren,Chuanqi Tan,Sinan Tan,Jianhong Tu,Peng Wang,Shijie Wang,Wei Wang,Shengguang Wu,Benfeng Xu,Jin Xu,An Yang,Hao Yang,Jian Yang,Shusheng Yang,Yang Yao,Bowen Yu,Hongyi Yuan,Zheng Yuan,Jianwei Zhang,Xingxuan Zhang,Yichang Zhang,Zhenru Zhang,Chang Zhou,Jingren Zhou,Xiaohuan Zhou,Tianhang Zhu
DOI: https://doi.org/10.48550/arXiv.2309.16609
2023-09-29
Abstract:Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.
Computation and Language
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to introduce the QWEN series, a large - scale language model series developed by Alibaba Group. QWEN is a comprehensive language model series that encompasses multiple models with different numbers of parameters. The main objectives are as follows: 1. **Enhance language understanding and generation capabilities**: - The QWEN basic pre - training model is trained on a large - scale data set, capable of handling various downstream tasks and exhibiting excellent performance. - Through supervised fine - tuning (SFT) and reinforcement learning from human feedback (RLHF), the model's dialogue ability and alignment with human preferences are further enhanced. 2. **Develop specialized models**: - Models specifically for programming (CODE - QWEN and CODE - QWEN - CHAT) have been developed, and these models perform well in code generation, debugging, and interpretation. - A model specifically for mathematical reasoning (MATH - QWEN - CHAT) has been developed, and these models outperform open - source models in math - related tasks and approach the performance of GPT - 3.5. 3. **Improve the usability and scalability of the model**: - By optimizing vocabulary design, context length extension techniques, and efficient attention mechanisms, the training and inference efficiency of the model is improved. - Some models are open - sourced, including the 14 - billion - parameter and 7 - billion - parameter basic pre - training models and their aligned chat models, to promote research and application. 4. **Address the limitations of existing large models**: - In response to the deficiencies of existing large models in reproducibility, controllability, and accessibility of service providers, the QWEN series of models has been improved in these aspects. 5. **Evaluate model performance**: - Through extensive benchmark tests, the performance of the QWEN model on multiple tasks, including language understanding, knowledge reasoning, code generation, and mathematical reasoning, has been evaluated. In summary, this paper demonstrates the wide applicability and superior performance of the QWEN series of models in the field of natural language processing by introducing their design, training, and evaluation methods.