Abstract: The ability of a machine to communicate with humans has long been associated with the general success of AI. This dates back to Alan Turing's epoch-making work in the early 1950s, which proposes that a machine's intelligence can be tested by how well it, the machine, can fool a human into believing that the machine is a human through dialogue conversations. Many systems learn generation rules from a minimal set of authored rules or labels on top of hand-coded rules or templates, and thus are both expensive and difficult to extend to open-domain scenarios. Recently, the emergence of neural network models the potential to solve many of the problems in dialogue learning that earlier systems cannot tackle: the end-to-end neural frameworks offer the promise of scalability and language-independence, together with the ability to track the dialogue state and then mapping between states and dialogue actions in a way not possible with conventional systems. On the other hand, neural systems bring about new challenges: they tend to output dull and generic responses; they lack a consistent or a coherent persona; they are usually optimized through single-turn conversations and are incapable of handling the long-term success of a conversation; and they are not able to take the advantage of the interactions with humans. This dissertation attempts to tackle these challenges: Contributions are two-fold: (1) we address new challenges presented by neural network models in open-domain dialogue generation systems; (2) we develop interactive question-answering dialogue systems by (a) giving the agent the ability to ask questions and (b) training a conversation agent through interactions with humans in an online fashion, where a bot improves through communicating with humans and learning from the mistakes that it makes.

EVA2.0: Investigating Open-domain Chinese Dialogue Systems with Large-scale Pre-training

EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training

OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts

Teaching Machines to Converse

PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

An Empirical Investigation of Pre-Trained Transformer Language Models for Open-Domain Dialogue Generation

DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation

Re3Dial: Retrieve, Reorganize and Rescale Conversations for Long-Turn Open-Domain Dialogue Pre-training

Advancing Speech Language Models by Scaling Supervised Fine-Tuning with Over 60,000 Hours of Synthetic Speech Dialogue Data

A Unified Pre-training Framework for Conversational AI

AliCHI: A Large-scale Multi-modal Dataset and Automated Evaluation Tool for Human-like Dialogue Systems

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Towards Boosting the Open-Domain Chatbot with Human Feedback

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems