LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning

Jiachun Li,Pengfei Cao,Chenhao Wang,Zhuoran Jin,Yubo Chen,Kang Liu,Xiaojian Jiang,Jiexin Xu,Jun Zhao
2024-10-12
Abstract:Large language models (LLMs) sometimes demonstrate poor performance on knowledge-intensive tasks, commonsense reasoning is one of them. Researchers typically address these issues by retrieving related knowledge from knowledge graphs or employing self-enhancement methods to elicit knowledge in LLMs. However, noisy knowledge and invalid reasoning issues hamper their ability to answer questions accurately. To this end, we propose a novel method named eliciting, filtering and integrating knowledge in large language model (LINKED). In it, we design a reward model to filter out the noisy knowledge and take the marginal consistent reasoning module to reduce invalid reasoning. With our comprehensive experiments on two complex commonsense reasoning benchmarks, our method outperforms SOTA baselines (up to 9.0% improvement of accuracy). Besides, to measure the positive and negative impact of the injected knowledge, we propose a new metric called effectiveness-preservation score for the knowledge enhancement works. Finally, through extensive experiments, we conduct an in-depth analysis and find many meaningful conclusions about LLMs in commonsense reasoning tasks.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the poor performance of large language models (LLMs) in commonsense reasoning tasks. Specifically, although LLMs have improved in handling knowledge-intensive tasks, they still exhibit significant shortcomings in commonsense reasoning. The paper points out that existing methods mainly retrieve relevant knowledge from knowledge graphs or use self-enhancement methods to extract knowledge from LLMs, but these methods have two major issues: 1. **Noisy Knowledge**: The knowledge generated by LLMs may contain significant noise, which is detrimental to reasoning. 2. **Invalid Reasoning**: Even when provided with reasonable knowledge, LLMs sometimes still arrive at incorrect answers. To address these issues, the paper proposes a new method called **LINKED** (eLiciting, fIltering and iNtegrating Knowledge in large languagE moDel), which filters noisy knowledge by designing a reward model and introduces a marginal consistency reasoning module to reduce invalid reasoning. Extensive experiments on four complex commonsense reasoning benchmark datasets show that this method significantly improves accuracy and performs well on the newly proposed Effectiveness Preservation Score (EPS) metric. ### Main Contributions 1. **Proposing the LINKED Method**: This method enhances the commonsense reasoning ability of LLMs through effective knowledge and introduces a new evaluation metric, EPS, to assess the effectiveness and harmfulness of knowledge enhancement methods. 2. **Designing a Reward Model**: Training a reward model to mitigate the noisy knowledge generated by LLMs. 3. **Marginal Consistency Reasoning Module**: Designing this module to address the issue of invalid reasoning. 4. **Extensive Experimental Validation**: Conducting experiments on multiple benchmark datasets, showing that this method significantly outperforms existing methods, especially on the WinoGrande dataset, with a 9.0% increase in accuracy and a 12.5% increase in EPS. ### Method Overview 1. **Knowledge Pool Construction**: Prompting LLMs to generate multiple relevant knowledge fragments through context learning and classifying them into different confidence levels based on their impact on the question answers. 2. **Reward Model Design**: Training a reward model to distinguish between high-quality and low-quality knowledge, filtering noisy knowledge through a ranking task. 3. **Marginal Consistency Reasoning**: During the reasoning phase, improving reasoning stability through multiple sampling and marginal majority voting, thereby reducing the occurrence of invalid reasoning. ### Experimental Results - **Main Results**: On the WinoGrande, HellaSwag, SocialIQA, and PIQA benchmark datasets, the LINKED method significantly outperforms most existing state-of-the-art methods, particularly on the WinoGrande dataset, with a 9.0% increase in accuracy. - **Ablation Experiments**: Validating the effectiveness of the reward model and the marginal consistency reasoning module by removing them. - **Human Evaluation**: Verifying the effectiveness of the LINKED method in addressing noisy knowledge and invalid reasoning through human evaluation, and showing that the newly proposed EPS metric is highly correlated with human evaluation. ### Conclusion By proposing the LINKED method, this paper effectively addresses the issues of noisy knowledge and invalid reasoning in LLMs for commonsense reasoning tasks, significantly improving model performance. Additionally, the newly proposed EPS metric provides strong support for evaluating the effectiveness and harmfulness of knowledge enhancement methods.