Abstract:This paper presents a novel generative model, Collaborative Competitive Agents (CCA), which leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks. Drawing inspiration from Generative Adversarial Networks (GANs), the CCA system employs two equal-status generator agents and a discriminator agent. The generators independently process user instructions and generate results, while the discriminator evaluates the outputs, and provides feedback for the generator agents to further reflect and improve the generation results. Unlike the previous generative model, our system can obtain the intermediate steps of generation. This allows each generator agent to learn from other successful executions due to its transparency, enabling a collaborative competition that enhances the quality and robustness of the system's results. The primary focus of this study is image editing, demonstrating the CCA's ability to handle intricate instructions robustly. The paper's main contributions include the introduction of a multi-agent-based generative model with controllable intermediate steps and iterative optimization, a detailed examination of agent relationships, and comprehensive experiments on image editing. Code is available at \href{

What problem does this paper attempt to address?

The paper proposes a new generative model named "Collaborative Competitive Agents (CCA)," aimed at addressing the limitations of existing generative models in handling complex, composite tasks and the challenges encountered during the updating process of generated results. Specifically, the paper targets the following two main issues: 1. **Limited ability to handle complex tasks**: Traditional generative models, such as Generative Adversarial Networks (GANs) and diffusion models, perform poorly when faced with complex tasks that involve multiple steps or requirements, such as "colorizing old photos, replacing the person in the picture with the user themselves, and adding a hoe in the user's hand." 2. **Difficulty in iterative optimization of generated results**: After generating results, if modifications or optimizations are needed, it is necessary to retain the computational graph. However, the vast amount of results produced by different algorithms makes maintaining the computational graph a significant challenge, hindering learning from other generative models. To address these issues, the paper introduces the CCA system, which utilizes multiple agents based on large language models (LLMs). These agents can independently process user instructions and generate results, while a discriminator agent evaluates these results and provides feedback, promoting collaboration and competition among the generative agents, thereby improving the quality and robustness of the generated results. Compared to traditional generative models, a notable advantage of CCA is its transparency, allowing agents to learn successful strategies from each other, thus achieving controllable intermediate steps and iterative optimization. Furthermore, the paper delves into the relationships between agents, including how reflection, cooperation, and competition mechanisms affect the overall performance of the system. The experimental section demonstrates the application of CCA in the field of image editing, proving that the model can robustly handle complex editing instructions and significantly outperforms traditional methods. In summary, the main contributions of the paper include: proposing a multi-agent based generative model with controllable intermediate steps and the ability to iteratively optimize; a thorough analysis of the relationships between agents within a multi-agent system; and a series of comprehensive experiments validating the effectiveness of CCA in image editing tasks.

CCA: Collaborative Competitive Agents for Image Editing

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge.

Statistics Enhancement Generative Adversarial Networks for Diverse Conditional Image Synthesis

Specific Diverse Text-to-Image Synthesis Via Exemplar Guidance

GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

Customizable GAN: Customizable Image Synthesis Based on Adversarial Learning.

Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation

VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Bilinear Representation for Language-based Image Editing Using Conditional Generative Adversarial Networks

Adversarial Code Learning for Image Generation

LLMGA: Multimodal Large Language Model based Generation Assistant

TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing

Core-attributes enhanced generative adversarial networks for robust image enhancement

Image manipulation with natural language using Two-sided Attentive Conditional Generative Adversarial Network

A Collaborative, Interactive and Context-Aware Drawing Agent for Co-Creative Design

Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models

CDE-GAN: Cooperative Dual Evolution Based Generative Adversarial Network

Sequential Attention GAN for Interactive Image Editing

Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation

Image Manipulation with Natural Language using Two-sidedAttentive Conditional Generative Adversarial Network