Abstract:In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format. Different from other open-domain dialogue models that focus on large-scale pre-training and scaling up model size or dialogue corpus, we aim to build a powerful and practical dialogue system for digital human with diverse skills and good multi-task generalization by internet-augmented instruction tuning. To this end, we first conduct large-scale pre-training on both common document corpus and dialogue data with curriculum learning, so as to inject various world knowledge and dialogue abilities into ChatPLUG. Then, we collect a wide range of dialogue tasks spanning diverse features of knowledge, personality, multi-turn memory, and empathy, on which we further instruction tune \modelname via unified natural language instruction templates. External knowledge from an internet search is also used during instruction finetuning for alleviating the problem of knowledge hallucinations. We show that \modelname outperforms state-of-the-art Chinese dialogue systems on both automatic and human evaluation, and demonstrates strong multi-task generalization on a variety of text understanding and generation tasks. In addition, we deploy \modelname to real-world applications such as Smart Speaker and Instant Message applications with fast inference. Our models and code will be made publicly available on ModelScope~\footnote{\small{https://modelscope.cn/models/damo/ChatPLUG-3.7B}} and Github~\footnote{\small{https://github.com/X-PLUG/ChatPLUG}}.

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

TCMChat: A Generative Large Language Model for Traditional Chinese Medicine

PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization

A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User's Intentions

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models

CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models

MOSS: an Open Conversational Large Language Model

XDAI: A Tuning-free Framework for Exploiting Pre-trained Language Models in Knowledge Grounded Dialogue Generation

Conversations Powered by Cross-Lingual Knowledge

Are Pre-trained Language Models Knowledgeable to Ground Open Domain Dialogues?

Improving Open-Domain Dialogue Response Generation with Multi-Source Multilingual Commonsense Knowledge

LaMDA: Language Models for Dialog Applications

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-supervised Learning and Explicit Policy Injection

Large Language Model based Situational Dialogues for Second Language Learning

Knowledge Grounded Pre-Trained Model for Dialogue Response Generation.

Synthetic Dialogue Dataset Generation using LLM Agents

DialogZoo: Large-Scale Dialog-Oriented Task Learning