Abstract:Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasized retrieval from unstructured text corpora, owing to its seamless integration into prompts. When using structured data such as knowledge graphs, most methods simplify it into natural text, neglecting the underlying structures. Moreover, a significant gap in the current landscape is the absence of a realistic benchmark for evaluating the effectiveness of grounding LLMs on heterogeneous knowledge sources (e.g., knowledge base and text). To fill this gap, we have curated a comprehensive dataset that poses two unique challenges: (1) Two-hop multi-source questions that require retrieving information from both open-domain structured and unstructured knowledge sources; retrieving information from structured knowledge sources is a critical component in correctly answering the questions. (2) The generation of symbolic queries (e.g., SPARQL for Wikidata) is a key requirement, which adds another layer of challenge. Our dataset is created using a combination of automatic generation through predefined reasoning chains and human annotation. We also introduce a novel approach that leverages multiple retrieval tools, including text passage retrieval and symbolic language-assisted retrieval. Our model outperforms previous approaches by a significant margin, demonstrating its effectiveness in addressing the above-mentioned reasoning challenges.

Knowledge Crosswords: Geometric Knowledge Reasoning with Large Language Models

Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models

Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Resolving Knowledge Conflicts in Large Language Models

GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving

SLR: A million-scale comprehensive crossword dataset for simultaneous learning and reasoning

Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study over Open-ended Question Answering

GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning

Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models

Systematic Assessment of Factual Knowledge in Large Language Models

KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

Logic Query of Thoughts: Guiding Large Language Models to Answer Complex Logic Queries with Knowledge Graphs

Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial Reasoning Problems?