Abstract:Large language models (LLMs) sometimes demonstrate poor performance on knowledge-intensive tasks, commonsense reasoning is one of them. Researchers typically address these issues by retrieving related knowledge from knowledge graphs or employing self-enhancement methods to elicit knowledge in LLMs. However, noisy knowledge and invalid reasoning issues hamper their ability to answer questions accurately. To this end, we propose a novel method named eliciting, filtering and integrating knowledge in large language model (LINKED). In it, we design a reward model to filter out the noisy knowledge and take the marginal consistent reasoning module to reduce invalid reasoning. With our comprehensive experiments on two complex commonsense reasoning benchmarks, our method outperforms SOTA baselines (up to 9.0% improvement of accuracy). Besides, to measure the positive and negative impact of the injected knowledge, we propose a new metric called effectiveness-preservation score for the knowledge enhancement works. Finally, through extensive experiments, we conduct an in-depth analysis and find many meaningful conclusions about LLMs in commonsense reasoning tasks.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the poor performance of large language models (LLMs) in commonsense reasoning tasks. Specifically, although LLMs have improved in handling knowledge-intensive tasks, they still exhibit significant shortcomings in commonsense reasoning. The paper points out that existing methods mainly retrieve relevant knowledge from knowledge graphs or use self-enhancement methods to extract knowledge from LLMs, but these methods have two major issues: 1. **Noisy Knowledge**: The knowledge generated by LLMs may contain significant noise, which is detrimental to reasoning. 2. **Invalid Reasoning**: Even when provided with reasonable knowledge, LLMs sometimes still arrive at incorrect answers. To address these issues, the paper proposes a new method called **LINKED** (eLiciting, fIltering and iNtegrating Knowledge in large languagE moDel), which filters noisy knowledge by designing a reward model and introduces a marginal consistency reasoning module to reduce invalid reasoning. Extensive experiments on four complex commonsense reasoning benchmark datasets show that this method significantly improves accuracy and performs well on the newly proposed Effectiveness Preservation Score (EPS) metric. ### Main Contributions 1. **Proposing the LINKED Method**: This method enhances the commonsense reasoning ability of LLMs through effective knowledge and introduces a new evaluation metric, EPS, to assess the effectiveness and harmfulness of knowledge enhancement methods. 2. **Designing a Reward Model**: Training a reward model to mitigate the noisy knowledge generated by LLMs. 3. **Marginal Consistency Reasoning Module**: Designing this module to address the issue of invalid reasoning. 4. **Extensive Experimental Validation**: Conducting experiments on multiple benchmark datasets, showing that this method significantly outperforms existing methods, especially on the WinoGrande dataset, with a 9.0% increase in accuracy and a 12.5% increase in EPS. ### Method Overview 1. **Knowledge Pool Construction**: Prompting LLMs to generate multiple relevant knowledge fragments through context learning and classifying them into different confidence levels based on their impact on the question answers. 2. **Reward Model Design**: Training a reward model to distinguish between high-quality and low-quality knowledge, filtering noisy knowledge through a ranking task. 3. **Marginal Consistency Reasoning**: During the reasoning phase, improving reasoning stability through multiple sampling and marginal majority voting, thereby reducing the occurrence of invalid reasoning. ### Experimental Results - **Main Results**: On the WinoGrande, HellaSwag, SocialIQA, and PIQA benchmark datasets, the LINKED method significantly outperforms most existing state-of-the-art methods, particularly on the WinoGrande dataset, with a 9.0% increase in accuracy. - **Ablation Experiments**: Validating the effectiveness of the reward model and the marginal consistency reasoning module by removing them. - **Human Evaluation**: Verifying the effectiveness of the LINKED method in addressing noisy knowledge and invalid reasoning through human evaluation, and showing that the newly proposed EPS metric is highly correlated with human evaluation. ### Conclusion By proposing the LINKED method, this paper effectively addresses the issues of noisy knowledge and invalid reasoning in LLMs for commonsense reasoning tasks, significantly improving model performance. Additionally, the newly proposed EPS metric provides strong support for evaluating the effectiveness and harmfulness of knowledge enhancement methods.

LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration

Guided Knowledge Generation with Language Models for Commonsense Reasoning

KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph

JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering

Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs

A Principled Framework for Knowledge-enhanced Large Language Model

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

ConceptEdit: Conceptualization-Augmented Knowledge Editing in Large Language Models for Commonsense Reasoning

Large Language Model-Enhanced Symbolic Reasoning for Knowledge Base Completion

Supervised Knowledge Makes Large Language Models Better In-context Learners

Causal Reasoning in Large Language Models: A Knowledge Graph Approach

Retrieval and Reasoning on KGs: Integrate Knowledge Graphs into Large Language Models for Complex Question Answering

Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering

LLMs for Relational Reasoning: How Far are We?

Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs