Abstract:Open-ended question answering requires models to find appropriate evidence to form well-reasoned, comprehensive and helpful answers. In practical applications, models also need to engage in extended discussions on potential scenarios closely relevant to the question. With augmentation of retrieval module, open-source Large Language Models (LLMs) can produce coherent answers often with different focuses, but are still sub-optimal in terms of reliable evidence selection and in-depth question analysis. In this paper, we propose a novel Chain-of-Discussion framework to leverage the synergy among multiple open-source LLMs aiming to provide \textbf{more correct} and \textbf{more comprehensive} answers for open-ended QA, although they are not strong enough individually. Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers. We release our data and code at \url{<a class="link-external link-https" href="https://github.com/kobayashikanna01/Chain-of-Discussion" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the insufficiency of evidence selection and analysis in answering complex open - ended questions. Specifically, when dealing with complex open - ended questions, although existing large - language models (LLMs) can generate coherent answers, they still have deficiencies in the selection of reliable evidence and in - depth question analysis. These problems are mainly reflected in two aspects: 1. **Imperfect retrieval models**: Retrieval models may introduce noise, causing the model to be unable to filter out all of this noisy evidence, thus affecting the integrity and accuracy of the answer. For example, in legal consultation, the model may erroneously return legal provisions related to guardianship qualifications instead of those related to economic support obligations due to semantic similarity. 2. **Comprehensiveness and consistency of answers**: It is expected that the model can not only provide correct answers, but also give consistent explanations and provide useful advice for situations that the user may encounter currently or in the future. However, even humans find it difficult to do this, especially when it is necessary to access appropriate evidence. For LLMs without specific training or fine - tuning, this is even more difficult. To address these challenges, the paper proposes a new framework named "Chain - of - Discussion", which improves the accuracy and comprehensiveness of answers through the interactive discussion among multiple open - source LLMs. Specifically, this framework encourages multiple LLMs to summarize, criticize, and correct each other's outputs, in order to reach a more evidence - based and practical answer. The main contributions of the paper include: 1. A high - quality complex evidence - based question - answering (CEBQA) dataset has been collected, which contains 200 carefully annotated legal consultation questions in the field of marriage and family affairs. 2. A new discussion - chain framework, namely summarize - criticize - revise, has been proposed, which utilizes the synergy among multiple open - source LLMs to generate more accurate and useful answers. 3. Through GPT - 4 - based and evidence - centered evaluations, it has been proven that this framework can help small LLMs benefit from each other and improve the overall quality, especially in terms of correctness and comprehensiveness.

Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering

CoQ:AN Empirical Framework for Multi-hop Question Answering Empowered by Large Language Models

Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering

KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering

An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism

Leveraging Large Language Models for Multiple Choice Question Answering

Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question Answering

ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering

L2R-QA: An Open-Domain Question Answering Framework

Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Drilling Down into the Discourse Structure with LLMs for Long Document Question Answering

ReasonChainQA: Text-based Complex Question Answering with Explainable Evidence Chains

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs

Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models

Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs Answering

Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action