Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs

Ronit Singhal,Pransh Patwa,Parth Patwa,Aman Chadha,Amitava Das
2024-10-05
Abstract:Given the widespread dissemination of misinformation on social media, implementing fact-checking mechanisms for online claims is essential. Manually verifying every claim is very challenging, underscoring the need for an automated fact-checking system. This paper presents our system designed to address this issue. We utilize the Averitec dataset (Schlichtkrull et al., 2023) to assess the performance of our fact-checking system. In addition to veracity prediction, our system provides supporting evidence, which is extracted from the dataset. We develop a Retrieve and Generate (RAG) pipeline to extract relevant evidence sentences from a knowledge base, which are then inputted along with the claim into a large language model (LLM) for classification. We also evaluate the few-shot In-Context Learning (ICL) capabilities of multiple LLMs. Our system achieves an 'Averitec' score of 0.33, which is a 22% absolute improvement over the baseline. Our Code is publicly available on <a class="link-external link-https" href="https://github.com/ronit-singhal/evidence-backed-fact-checking-using-rag-and-few-shot-in-context-learning-with-llms" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is the widespread dissemination of false information and misinformation on social media platforms. Specifically, the paper aims to develop an automated fact-checking system to reduce the enormous workload required to manually verify each online statement. This system is not only capable of predicting the veracity of statements but also provides supporting evidence to ensure system transparency and enhance public trust. The main contributions of the paper include: 1. Developing an automated fact-checking system that combines Retrieval-Augmented Generation (RAG) and In-Context Learning (ICL) with a few samples. 2. The system requires only a small number of training samples, thus eliminating the need for large-scale manually annotated datasets. 3. Conducting experiments on various state-of-the-art Large Language Models (LLMs) and providing a comprehensive analysis of the results. Through these methods, the paper proposes an effective and efficient solution for automated fact-checking.