Abstract:Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasized retrieval from unstructured text corpora, owing to its seamless integration into prompts. When using structured data such as knowledge graphs, most methods simplify it into natural text, neglecting the underlying structures. Moreover, a significant gap in the current landscape is the absence of a realistic benchmark for evaluating the effectiveness of grounding LLMs on heterogeneous knowledge sources (e.g., knowledge base and text). To fill this gap, we have curated a comprehensive dataset that poses two unique challenges: (1) Two-hop multi-source questions that require retrieving information from both open-domain structured and unstructured knowledge sources; retrieving information from structured knowledge sources is a critical component in correctly answering the questions. (2) The generation of symbolic queries (e.g., SPARQL for Wikidata) is a key requirement, which adds another layer of challenge. Our dataset is created using a combination of automatic generation through predefined reasoning chains and human annotation. We also introduce a novel approach that leverages multiple retrieval tools, including text passage retrieval and symbolic language-assisted retrieval. Our model outperforms previous approaches by a significant margin, demonstrating its effectiveness in addressing the above-mentioned reasoning challenges.

What problem does this paper attempt to address?

The problem this paper attempts to address is the evaluation of large language models (LLMs) in their reasoning capabilities when combining knowledge bases and text in open-domain question answering tasks. Specifically, existing methods primarily focus on retrieving information from unstructured text corpora, neglecting the utilization of structured data (such as knowledge graphs). Additionally, there is currently a lack of a realistic benchmark to assess the grounding performance of LLMs on heterogeneous knowledge sources (e.g., knowledge bases and text). Therefore, this paper proposes a new comprehensive dataset, DIVKNOW QA, aimed at filling this gap and evaluating models through two unique challenges: 1. a two-hop multi-source problem that requires retrieving information from both structured and unstructured knowledge sources in an open domain; 2. generating symbolic queries (such as SPARQL for Wikidata), which adds an extra layer of challenge. Through these problems, the authors hope to evaluate and improve the ability of LLMs in handling complex, multi-source information.

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

Augmenting Reasoning Capabilities of LLMs with Graph Structures in Knowledge Base Question Answering

LLM-based Discriminative Reasoning for Knowledge Graph Question Answering

A Framework of Knowledge Graph-Enhanced Large Language Model Based on Question Decomposition and Atomic Retrieval

Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering

Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting

KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph

Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering

An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge

Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering

ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained Language Models for Question Answering over Knowledge Graph

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

Retrieval and Reasoning on KGs: Integrate Knowledge Graphs into Large Language Models for Complex Question Answering

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities