Pistis-RAG: Enhancing Retrieval-Augmented Generation with Human Feedback

Yu Bai,Yukai Miao,Li Chen,Dawei Wang,Dan Li,Yanyu Ren,Hongtao Xie,Ce Yang,Xuhui Cai

2024-10-31

Abstract:RAG systems face limitations when semantic relevance alone does not guarantee improved generation quality. This issue becomes particularly evident due to the sensitivity of large language models (LLMs) to the ordering of few-shot prompts, which can affect model performance. To address this challenge, aligning LLM outputs with human preferences using structured feedback, such as options to copy, regenerate, or dislike, offers a promising method for improvement. This feedback is applied to the entire list of inputs rather than giving specific ratings for individual documents, making it a Listwide Labels Learning-to-Rank task. To address this task, we propose Pistis-RAG, a new RAG framework designed with a content-centric approach to better align LLMs with human preferences. Pistis-RAG effectively utilizes human feedback, enhancing content ranking and generation quality. To validate our framework, we use public datasets to simulate human feedback, allowing us to evaluate and refine our method effectively. Experimental results indicate that Pistis-RAG improves alignment with human preferences relative to the baseline RAG system, showing a 6.06% increase in MMLU (English) and a 7.08% increase in C-EVAL (Chinese) accuracy metrics. These results highlight Pistis-RAG's effectiveness in overcoming the limitations associated with traditional RAG approaches.

Information Retrieval,Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that when existing RAG (Retrieval - Augmented Generation) systems rely solely on semantic relevance, they cannot guarantee an improvement in generation quality. In particular, since large - language models (LLMs) are sensitive to the order of few - shot prompts, this may affect the model performance. To address this challenge, the paper proposes a new RAG framework - Pistis - RAG, which aims to better align the output of LLMs with human preferences through structured user feedback (such as copy, regenerate or dislike options). This feedback is applied to the entire input list rather than scoring individual documents, so it is regarded as a learning - to - rank task of "list - level labels". Pistis - RAG operates through two main stages: feedback alignment and online query. In the feedback alignment stage, human feedback is utilized through online learning to increase the sensitivity of the ranking model to human and LLM preferences and to adapt to changing expectations. In the query stage, Pistis - RAG uses a true ranker to re - order the retrieved content according to the optimized ranking model, considering semantic relevance and the order presented to the LLM, ensuring that the final output conforms to human preferences and is consistent with the LLM's generation ability. The paper verifies the effectiveness of its method by simulating human feedback on public datasets. The experimental results show that Pistis - RAG improves by 6.06% and 7.08% respectively on the MMLU (English) and C - EVAL (Chinese) accuracy metrics, demonstrating the effectiveness of this framework in overcoming the limitations of traditional RAG methods.

Pistis-RAG: Enhancing Retrieval-Augmented Generation with Human Feedback

Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation

RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework

RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU

DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Don't Forget to Connect! Improving RAG with Graph-based Reranking

Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems

RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation

Reward-RAG: Enhancing RAG with Reward Driven Supervision

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems

ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems