Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation

Shuo Tang,Xianghe Pang,Zexi Liu,Bohan Tang,Rui Ye,Xiaowen Dong,Yanfeng Wang,Siheng Chen
2024-10-18
Abstract:Post-training is essential for enabling large language models (LLMs) to follow human instructions. Inspired by the recent success of using LLMs to simulate human society, we leverage multi-agent simulation to automatically generate diverse text-based scenarios, capturing a wide range of real-world human needs. We propose MATRIX, a multi-agent simulator that creates realistic and scalable scenarios. Leveraging these outputs, we introduce a novel scenario-driven instruction generator MATRIX-Gen for controllable and highly realistic data synthesis. Extensive experiments demonstrate that our framework effectively generates both general and domain-specific data. Notably, on AlpacaEval 2 and Arena-Hard benchmarks, Llama-3-8B-Base, post-trained on datasets synthesized by MATRIX-Gen with just 20K instruction-response pairs, outperforms Meta's Llama-3-8B-Instruct model, which was trained on over 10M pairs; see our project at <a class="link-external link-https" href="https://github.com/ShuoTang123/MATRIX-Gen" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced by large - language models (LLMs) in the post - training process, especially how to generate high - quality instruction data that meets the real - world requirements. Specifically: 1. **Challenges in data acquisition**: There are significant challenges in obtaining high - quality instruction data from the real world, including privacy issues, data scarcity, and high labor costs. 2. **Limitations of existing methods**: Existing data synthesis methods usually rely on aligned LLMs to generate new instructions. Although these methods are efficient, they cannot explicitly incorporate real - world user requirements into the data synthesis process. In addition, these methods are highly dependent on manually - designed predefined prompts, which increases the risk of generating unrealistic instructions that do not meet the actual user requirements and reduces the controllability of generating specific data. To solve these problems, the paper proposes a new framework based on multi - agent simulation for automatically generating diverse text scenarios and capturing a wide range of real - world human needs. Specifically, the main contributions of the paper include: - **Introducing multi - agent simulation**: This is the first time that multi - agent simulation has been applied to post - training data synthesis of LLMs. By simulating diverse and highly realistic social scenarios, it not only improves the authenticity of the synthesized data but also provides the controllability required to generate specific, high - quality synthesized data. - **Proposing a new post - training data synthesis framework**: This framework integrates a multi - agent social simulator (MATRIX) and a demand - oriented instruction generator (MATRIX - Gen). Using the diverse and realistic scenarios generated by the simulator, it can synthesize high - quality real post - training data suitable for various scenarios. - **Extensive experimental evaluation**: Through a large number of experiments, the effectiveness of the proposed framework has been verified. In particular, in the AlpacaEval 2 and Arena - Hard benchmark tests, the Llama - 3 - 8B - Base model post - trained with 20,000 synthesized instruction - response pairs outperforms Meta's Llama - 3 - 8B - Instruct model post - trained with more than 10 million pairs in multiple areas (such as general problem - solving ability, multi - round dialogue ability, coding accuracy, and security level). In conclusion, this paper aims to improve the post - training effect of LLMs through innovative data synthesis methods, making them more effectively understand and follow human instructions.