Abstract:We present a Chain-of-Action (CoA) framework for multimodal and retrieval-augmented Question-Answering (QA). Compared to the literature, CoA overcomes two major challenges of current QA applications: (i) unfaithful hallucination that is inconsistent with real-time or domain facts and (ii) weak reasoning performance over compositional information. Our key contribution is a novel reasoning-retrieval mechanism that decomposes a complex question into a reasoning chain via systematic prompting and pre-designed actions. Methodologically, we propose three types of domain-adaptable `Plug-and-Play' actions for retrieving real-time information from heterogeneous sources. We also propose a multi-reference faith score (MRFS) to verify and resolve conflicts in the answers. Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of CoA over other methods.
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve
The paper aims to address two main challenges faced by large language models (LLMs) when answering complex questions:
1. **Unfaithful Generation**: The responses from LLMs may not align with real-time or domain-specific facts. For example, in Figure 1(b), the LLM fails to locate relevant facts.
2. **Weak Reasoning Ability**: LLMs perform poorly in aggregating heterogeneous information sources, resolving conflicts, and providing useful and customized responses. For instance, in Figure 1(c), although relevant search results are successfully located, the analysis still halts.
To tackle these challenges, the paper proposes a new framework—**Chain-of-Action (CoA)**, which decomposes complex questions into reasoning chains through systematic prompts and predefined actions. Specifically, the main contributions of the CoA framework include:
- **Novel Reasoning-Retrieval Mechanism**: Decomposing complex questions into reasoning chains through systematic prompts and predefined actions.
- **Three Pluggable Domain-Adaptive Actions**: Retrieving real-time information from heterogeneous sources, encoding domain knowledge, and analyzing tabular and numerical data.
- **Multi-Reference Faithfulness Scoring (MRFS)**: Verifying and resolving conflicts in the answers.
### Experiments and Applications
The paper demonstrates the capabilities of the CoA framework through public benchmarks and a Web3 case study. Experimental results show that the CoA framework outperforms existing methods on multiple QA datasets and achieves significant user engagement and positive feedback in practical applications.
### Main Contributions
1. **Proposing the CoA Framework**: Integrating a new reasoning-retrieval mechanism that decomposes complex questions into configurable action reasoning chains.
2. **Designing Three Pluggable Domain-Adaptive Actions**: For real-time information retrieval, domain knowledge encoding, and tabular data analysis.
3. **Introducing Multi-Reference Faithfulness Scoring (MRFS)**: Identifying and resolving conflicts between retrieved information and LLM-generated answers, enhancing answer reliability.
4. **Experimental Results**: The CoA framework outperforms existing methods in multiple public benchmarks.
5. **Practical Application**: Deploying the CoA framework in a Web3 QA application, significantly improving user engagement and positive feedback, validating its effectiveness and practicality in real-world scenarios.
### Methodology
The paper details the methodology of the CoA framework, including how to generate action chains, execute actions, and ultimately generate answers. The specific steps are as follows:
1. **Action Chain Generation**: Generating action chains through context learning, with each action node containing sub-questions, missing flags, and initial answers.
2. **Action Execution and Monitoring**: Handling the multimodal retrieval needs of nodes through three steps: retrieving relevant information, verifying if the LLM-generated answers need correction, and filling in missing content if necessary.
3. **Final Answer Generation**: Generating the final answer by the LLM based on the refined and processed action chain.
### Experimental Analysis
The paper presents a detailed experimental analysis, showcasing the performance advantages of the CoA framework on different QA datasets. Specifically, it shows:
- **No Information Retrieval Tasks**: The CoA framework improves by 3.42% over the best existing baseline method (SearchChain without IR) in no information retrieval tasks.
- **Information Retrieval Tasks**: The CoA framework improves by 6.14% over the best existing baseline method (SearchChain) in information retrieval tasks.
- **Reasoning Steps**: The CoA framework averages more reasoning steps when decomposing complex questions, indicating stronger reasoning capabilities in handling complex problems.
- **LLM Usage Frequency**: The CoA framework reduces the frequency of LLM usage, improving efficiency.
- **Resistance to External Knowledge Interference**: The CoA framework demonstrates higher accuracy in handling external knowledge, indicating strong data parsing and filtering capabilities.
### Conclusion
The paper empirically demonstrates the superiority of the CoA framework in understanding and answering complex queries, showcasing advanced reasoning capabilities and resistance to external erroneous information. These findings establish the CoA framework as a new benchmark in the fields of question answering and fact-checking, highlighting its comprehensive superiority.