HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World Claims

Yejun Yoon,Jaeyoon Jung,Seunghyun Yoon,Kunwoo Park
2024-10-20
Abstract:To tackle the AVeriTeC shared task hosted by the FEVER-24, we introduce a system that only employs publicly available large language models (LLMs) for each step of automated fact-checking, dubbed the Herd of Open LLMs for verifying real-world claims (HerO). For evidence retrieval, a language model is used to enhance a query by generating hypothetical fact-checking documents. We prompt pretrained and fine-tuned LLMs for question generation and veracity prediction by crafting prompts with retrieved in-context samples. HerO achieved 2nd place on the leaderboard with the AVeriTeC score of 0.57, suggesting the potential of open LLMs for verifying real-world claims. For future research, we make our code publicly available at <a class="link-external link-https" href="https://github.com/ssu-humane/HerO" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Computers and Society
What problem does this paper attempt to address?
The paper attempts to address the problem of automatically verifying the authenticity of claims in the real world. Specifically, the paper introduces a system called HerO (Herd of Open Large Language Models for verifying real-world claims), which uses only publicly available large-scale language models (LLMs) to complete every step of the automated fact-checking task, including evidence retrieval, question generation, and authenticity prediction. The paper points out that existing fact-checking datasets have critical issues such as context dependency, insufficient evidence, and temporal leakage, which make existing systems perform poorly when verifying real-world claims. Therefore, this study aims to develop a system that can effectively verify the authenticity of real-world claims by leveraging the latest large-scale language model technology. The HerO system achieved second place in the A VeriTeC shared task, with an A VeriTeC score of 0.57, indicating that using open large-scale language models for real-world claim verification has great potential. Additionally, the research team has made their code publicly available to facilitate future research.