Abstract:Today, most large-scale conversational AI agents (e.g. Alexa, Siri, or Google Assistant) are built using manually annotated data to train the different components of the system. Typically, the accuracy of the ML models in these components are improved by manually transcribing and annotating data. As the scope of these systems increase to cover more scenarios and domains, manual annotation to improve the accuracy of these components becomes prohibitively costly and time consuming. In this paper, we propose a system that leverages user-system interaction feedback signals to automate learning without any manual annotation. Users here tend to modify a previous query in hopes of fixing an error in the previous turn to get the right results. These reformulations, which are often preceded by defective experiences caused by errors in ASR, NLU, ER or the application. In some cases, users may not properly formulate their requests (e.g. providing partial title of a song), but gleaning across a wider pool of users and sessions reveals the underlying recurrent patterns. Our proposed self-learning system automatically detects the errors, generate reformulations and deploys fixes to the runtime system to correct different types of errors occurring in different components of the system. In particular, we propose leveraging an absorbing Markov Chain model as a collaborative filtering mechanism in a novel attempt to mine these patterns. We show that our approach is highly scalable, and able to learn reformulations that reduce Alexa-user errors by pooling anonymized data across millions of customers. The proposed self-learning system achieves a win/loss ratio of 11.8 and effectively reduces the defect rate by more than 30% on utterance level reformulations in our production A/B tests. To the best of our knowledge, this is the first self-learning large-scale conversational AI system in production.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use the feedback signals of customer - system interactions to automate learning in large - scale conversational AI agents without any manual annotation. Specifically, as the scope of these systems expands to cover more scenarios and domains, it becomes costly and time - consuming to improve the accuracy of components such as automatic speech recognition (ASR), natural language understanding (NLU), and entity resolution (ER) by manually transcribing and annotating data. The paper proposes a system that can automatically detect errors, generate reformulations, and deploy corrective measures to correct various types of errors in different components of the system. In particular, the authors propose to use the absorbing Markov chain model as a collaborative filtering mechanism to mine these patterns in a novel way. The paper shows that this method is highly scalable and can reduce Alexa user errors by pooling anonymous data from millions of users. In the A/B test in the production environment, this self - learning system achieved a win - loss ratio of 11.8 and effectively reduced the defect rate of statement - level reformulations by more than 30%. This is, as far as the authors know, the first self - learning large - scale conversational AI system put into production.

Feedback-Based Self-Learning in Large-Scale Conversational AI Agents

A Self-Learning Framework for Large-Scale Conversational AI Systems

Dialogue Learning with Human-in-the-Loop.

Uman-in-thel oop

Teaching Machines to Converse

Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems

Learning from Naturally Occurring Feedback

An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems

A Base Camp for Scaling AI

FastLearn: A Rapid Learning Agent for Chat Models to Acquire Latest Knowledge

Lifelong Learning Dialogue Systems: Chatbots that Self-Learn On the Job

UltraFeedback: Boosting Language Models with Scaled AI Feedback

Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations

Using Generative AI and Multi-Agents to Provide Automatic Feedback

Exploiting Simulated User Feedback for Conversational Search: Ranking, Rewriting, and Beyond

Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System

Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

Simulating User Agents for Embodied Conversational-AI

Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA