RAG based Question-Answering for Contextual Response Prediction System

Sriram Veturi,Saurabh Vaichal,Reshma Lal Jagadheesh,Nafis Irtiza Tripto,Nian Yan
2024-09-06
Abstract:Large Language Models (LLMs) have shown versatility in various Natural Language Processing (NLP) tasks, including their potential as effective question-answering systems. However, to provide precise and relevant information in response to specific customer queries in industry settings, LLMs require access to a comprehensive knowledge base to avoid hallucinations. Retrieval Augmented Generation (RAG) emerges as a promising technique to address this challenge. Yet, developing an accurate question-answering framework for real-world applications using RAG entails several challenges: 1) data availability issues, 2) evaluating the quality of generated content, and 3) the costly nature of human evaluation. In this paper, we introduce an end-to-end framework that employs LLMs with RAG capabilities for industry use cases. Given a customer query, the proposed system retrieves relevant knowledge documents and leverages them, along with previous chat history, to generate response suggestions for customer service agents in the contact centers of a major retail company. Through comprehensive automated and human evaluations, we show that this solution outperforms the current BERT-based algorithms in accuracy and relevance. Our findings suggest that RAG-based LLMs can be an excellent support to human customer service representatives by lightening their workload.
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The paper aims to address the following issues: In industrial environments, large language models (LLMs), although performing excellently in various natural language processing tasks, tend to generate inaccurate or incorrect information when responding to specific customer queries. This is because LLMs rely on patterns learned from data, which may not contain the necessary knowledge for specific domains. To overcome this challenge, the paper proposes a framework based on Retrieval-Augmented Generation (RAG) technology to develop a knowledge-driven response prediction system suitable for call centers of large retail companies. Specifically, the paper achieves this goal through the following aspects: 1. **Data Preparation**: Created a comprehensive dataset containing relevant question-answer pairs and their corresponding knowledge documents. 2. **Evaluation of Retrieval Strategies and Embedding Methods**: Compared the effects of different embedding strategies (such as Universal Sentence Encoder, Google’s Vertex AI, and SBERT-all-mpnet-base-v2) and retrieval strategies (such as ScaNN and KNN HNSW) to determine the best configuration. 3. **Optimization of Generation Models**: Utilized the PaLM2 foundational model for text generation and tested different prompting techniques to ensure that LLMs can generate fact-based and relevant responses. 4. **Evaluation and Deployment**: Verified the performance of the proposed RAG LLM system through automated and manual evaluations, demonstrating its superiority over the existing BERT baseline system, and deployed it into the production environment to provide real-time support for customer service representatives. Overall, this study demonstrates how RAG technology can improve the performance of LLMs in practical applications, particularly in reducing hallucinations and enhancing response accuracy.