Abstract: The ability of a machine to communicate with humans has long been associated with the general success of AI. This dates back to Alan Turing's epoch-making work in the early 1950s, which proposes that a machine's intelligence can be tested by how well it, the machine, can fool a human into believing that the machine is a human through dialogue conversations. Many systems learn generation rules from a minimal set of authored rules or labels on top of hand-coded rules or templates, and thus are both expensive and difficult to extend to open-domain scenarios. Recently, the emergence of neural network models the potential to solve many of the problems in dialogue learning that earlier systems cannot tackle: the end-to-end neural frameworks offer the promise of scalability and language-independence, together with the ability to track the dialogue state and then mapping between states and dialogue actions in a way not possible with conventional systems. On the other hand, neural systems bring about new challenges: they tend to output dull and generic responses; they lack a consistent or a coherent persona; they are usually optimized through single-turn conversations and are incapable of handling the long-term success of a conversation; and they are not able to take the advantage of the interactions with humans. This dissertation attempts to tackle these challenges: Contributions are two-fold: (1) we address new challenges presented by neural network models in open-domain dialogue generation systems; (2) we develop interactive question-answering dialogue systems by (a) giving the agent the ability to ask questions and (b) training a conversation agent through interactions with humans in an online fashion, where a bot improves through communicating with humans and learning from the mistakes that it makes.

Bot-adversarial dialogue for safe conversational agents

Adversarial Learning for Neural Dialogue Generation.

Teaching Machines to Converse

Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents

Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

Robust Conversational Agents against Imperceptible Toxicity Triggers

Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

Learn What NOT to Learn: Towards Generative Safety in Chatbots

Improving Dialog Safety using Socially Aware Contrastive Learning

Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

Using In-Context Learning to Improve Dialogue Safety

Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation

ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

ProsocialDialog: A Prosocial Backbone for Conversational Agents

Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey

Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

An Adversarially-Learned Turing Test for Dialog Generation Models

Poison Attacks and Adversarial Prompts Against an Informed University Virtual Assistant