Abstract:We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. This process is performed in a single chain of thoughts through our seven carefully designed subtasks derived from the Nominal Group Technique (Ven and Delbecq, 1974), a well-established decision-making framework. Our evaluations demonstrate that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness. It further achieves state-of-the-art truthfulness by outperforming the best baseline by 8.69% with ChatGPT. Multi-expert Prompting is efficient, explainable, and highly adaptable to diverse scenarios, eliminating the need for manual prompt construction.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the reliability and safety issues in the responses generated by large language models (LLMs). Specifically, the authors propose a new method called **Multi-expert Prompting** to improve the following aspects of LLM-generated responses: 1. **Truthfulness**: Ensuring that the model-generated responses are consistent with facts, reducing misleading information. 2. **Factuality**: Ensuring that the generated content is based on real data and facts. 3. **Toxicity**: Reducing harmful or offensive language in the generated content. 4. **Hurtfulness**: Avoiding content that may cause emotional harm to users. 5. **Informativeness**: Increasing the amount of information in the generated content, providing more details and in-depth insights. 6. **Usefulness**: Ensuring that the generated content has practical value for users and effectively conveys information. ### Method Overview **Multi-expert Prompting** generates responses by simulating multiple experts and then aggregating these responses to select the best answer. The specific steps are as follows: 1. **Expert and Response Generation**: - Given an input instruction, the model first generates the identities and brief descriptions of multiple experts. - Each expert independently responds to the input instruction, generating multiple long-form expert responses. 2. **Expert Response Aggregation**: - Through 7 carefully designed sub-tasks, the multiple expert responses are aggregated into a final response. - These sub-tasks include identifying consensus views, conflicting views, unique perspectives, and ultimately selecting the best response. ### Main Contributions 1. **Performance Improvement**: Experimental results show that Multi-expert Prompting significantly outperforms existing baseline methods, excelling in truthfulness, factuality, non-toxicity, and non-hurtfulness. 2. **High Adaptability**: This method is applicable to various scenarios without the need for manually constructed prompts. 3. **Strong Interpretability**: Through the 7 sub-tasks, the contribution of each step can be clearly seen, enhancing the model's interpretability. ### Experimental Validation The authors validated the effectiveness of Multi-expert Prompting through multiple benchmark tests, including TruthfulQA, FactualityPrompt, BOLD, and HONEST. The results show that Multi-expert Prompting significantly outperforms other methods on all metrics, achieving a new state-of-the-art level on the TruthfulQA dataset. ### Conclusion By integrating the perspectives of multiple experts, Multi-expert Prompting not only improves the quality of the generated content but also enhances the reliability and safety of the model, providing a new approach to solving the generation problems of large language models.

Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

ExpertPrompting: Instructing Large Language Models to Be Distinguished Experts.

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Automatic Prompt Selection for Large Language Models

PromptExp: Multi-granularity Prompt Explanation of Large Language Models

Towards Generalist Prompting for Large Language Models by Mental Models

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models

One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Toward Large Language Models as a Therapeutic Tool: Comparing Prompting Techniques to Improve GPT-Delivered Problem-Solving Therapy

Guiding Large Language Models via Directional Stimulus Prompting

Efficient Prompting Methods for Large Language Models: A Survey

Supervisory Prompt Training

Mitigating Exaggerated Safety in Large Language Models

KnowGPT: Knowledge Graph based Prompting for Large Language Models

MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization

Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

Metacognitive Prompting Improves Understanding in Large Language Models

PromptAid: Prompt Exploration, Perturbation, Testing and Iteration using Visual Analytics for Large Language Models

Prompting GPT-3 To Be Reliable