A Multi-Source Retrieval Question Answering Framework Based on RAG

Ridong Wu,Shuhong Chen,Xiangbiao Su,Yuankai Zhu,Yifei Liao,Jianming Wu
2024-05-29
Abstract:With the rapid development of large-scale language models, Retrieval-Augmented Generation (RAG) has been widely adopted. However, existing RAG paradigms are inevitably influenced by erroneous retrieval information, thereby reducing the reliability and correctness of generated results. Therefore, to improve the relevance of retrieval information, this study proposes a method that replaces traditional retrievers with GPT-3.5, leveraging its vast corpus knowledge to generate retrieval information. We also propose a web retrieval based method to implement fine-grained knowledge retrieval, Utilizing the powerful reasoning capability of GPT-3.5 to realize semantic partitioning of <a class="link-external link-http" href="http://problem.In" rel="external noopener nofollow">this http URL</a> order to mitigate the illusion of GPT retrieval and reduce noise in Web retrieval,we proposes a multi-source retrieval framework, named MSRAG, which combines GPT retrieval with web retrieval. Experiments on multiple knowledge-intensive QA datasets demonstrate that the proposed framework in this study performs better than existing RAG framework in enhancing the overall efficiency and accuracy of QA systems.
Information Retrieval,Artificial Intelligence
What problem does this paper attempt to address?
The aim of this paper is to address the issues of information relevance and noise in traditional Retrieval-Augmented Generation (RAG) frameworks. Existing RAG methods may be affected by erroneous retrieval information, which lowers the reliability and accuracy of the generated results. Therefore, this paper proposes a new framework called MSRAG (Multi-Source Retrieval Augmented Generation). First, MSRAG replaces the traditional retriever with GPT-3.5 to generate retrieval information using its vast corpus knowledge in order to improve information relevance. Secondly, fine-grained knowledge retrieval is achieved through a web-based retrieval approach, utilizing GPT-3.5's powerful reasoning capability for semantic segmentation. To mitigate the hallucination effect of GPT retrieval and reduce noise from web retrieval, MSRAG combines GPT retrieval with web retrieval, forming a multi-source retrieval framework. Experiments on multiple knowledge-intensive question answering datasets verify the effectiveness of MSRAG, demonstrating its superiority over other RAG frameworks in enhancing the overall efficiency and accuracy of question answering systems. The specific contributions are as follows: 1. Use GPT-3.5 for retrieval to improve information relevance and achieve semantic segmentation of queries through a web retrieval framework. 2. Propose a multi-source retrieval method that combines GPT retrieval with web retrieval to reduce erroneous information and noise. 3. Experimental results on different datasets demonstrate the outstanding performance of MSRAG in improving task performance for question answering. The paper also conducts ablation studies to validate the effectiveness of GPT retrieval and the advantages of the integrated MSRAG framework. Future research may focus on improving the performance of GPT retrieval and exploring methods to reduce the operational cost and accelerate the runtime of the integrated approach.