Abstract:This paper presents a novel generative model, Collaborative Competitive Agents (CCA), which leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks. Drawing inspiration from Generative Adversarial Networks (GANs), the CCA system employs two equal-status generator agents and a discriminator agent. The generators independently process user instructions and generate results, while the discriminator evaluates the outputs, and provides feedback for the generator agents to further reflect and improve the generation results. Unlike the previous generative model, our system can obtain the intermediate steps of generation. This allows each generator agent to learn from other successful executions due to its transparency, enabling a collaborative competition that enhances the quality and robustness of the system's results. The primary focus of this study is image editing, demonstrating the CCA's ability to handle intricate instructions robustly. The paper's main contributions include the introduction of a multi-agent-based generative model with controllable intermediate steps and iterative optimization, a detailed examination of agent relationships, and comprehensive experiments on image editing. Code is available at \href{
What problem does this paper attempt to address?
The paper proposes a new generative model named "Collaborative Competitive Agents (CCA)," aimed at addressing the limitations of existing generative models in handling complex, composite tasks and the challenges encountered during the updating process of generated results. Specifically, the paper targets the following two main issues:
1. **Limited ability to handle complex tasks**: Traditional generative models, such as Generative Adversarial Networks (GANs) and diffusion models, perform poorly when faced with complex tasks that involve multiple steps or requirements, such as "colorizing old photos, replacing the person in the picture with the user themselves, and adding a hoe in the user's hand."
2. **Difficulty in iterative optimization of generated results**: After generating results, if modifications or optimizations are needed, it is necessary to retain the computational graph. However, the vast amount of results produced by different algorithms makes maintaining the computational graph a significant challenge, hindering learning from other generative models.
To address these issues, the paper introduces the CCA system, which utilizes multiple agents based on large language models (LLMs). These agents can independently process user instructions and generate results, while a discriminator agent evaluates these results and provides feedback, promoting collaboration and competition among the generative agents, thereby improving the quality and robustness of the generated results. Compared to traditional generative models, a notable advantage of CCA is its transparency, allowing agents to learn successful strategies from each other, thus achieving controllable intermediate steps and iterative optimization.
Furthermore, the paper delves into the relationships between agents, including how reflection, cooperation, and competition mechanisms affect the overall performance of the system. The experimental section demonstrates the application of CCA in the field of image editing, proving that the model can robustly handle complex editing instructions and significantly outperforms traditional methods.
In summary, the main contributions of the paper include: proposing a multi-agent based generative model with controllable intermediate steps and the ability to iteratively optimize; a thorough analysis of the relationships between agents within a multi-agent system; and a series of comprehensive experiments validating the effectiveness of CCA in image editing tasks.