SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

Dong Zhang,Zhaowei Li,Pengyu Wang,Xin Zhang,Yaqian Zhou,Xipeng Qiu

2024-01-08

Abstract:Human communication is a complex and diverse process that not only involves multiple factors such as language, commonsense, and cultural backgrounds but also requires the participation of multimodal information, such as speech. Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society. Can we leverage LLM-based multi-agent systems to simulate human communication? However, current LLM-based multi-agent systems mainly rely on text as the primary medium. In this paper, we propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication. SpeechAgents utilizes multi-modal LLM as the control center for individual agent and employes multi-modal signals as the medium for exchanged messages among agents. Additionally, we propose Multi-Agent Tuning to enhance the multi-agent capabilities of LLM without compromising general abilities. To strengthen and evaluate the effectiveness of human communication simulation, we build the Human-Communication Simulation Benchmark. Experimental results demonstrate that SpeechAgents can simulate human communication dialogues with consistent content, authentic rhythm, and rich emotions and demonstrate excellent scalability even with up to 25 agents, which can apply to tasks such as drama creation and audio novels generation. Code and models will be open-sourced at https://github. com/0nutation/SpeechAgents

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use multi - agent systems based on large - language models (LLMs) to simulate human multi - modal communication. Current LLMs multi - agent systems mainly rely on text as the main medium for information exchange and lack the ability to perceive and generate multi - modal signals. The paper proposes a new multi - modal LLMs multi - agent system - SpeechAgents, aiming to simulate human communication through multi - modal signals such as voice, thereby enhancing the authenticity and richness of communication. Specifically, the paper focuses on the following aspects: 1. **Use of multi - modal signals**: Current multi - agent systems mainly rely on text, while human communication is a multi - modal process involving multiple factors such as language, emotion, non - verbal expression, and cultural background. The paper proposes using multi - modal signals (such as voice) as the medium for information exchange between agents to more realistically simulate human communication. 2. **Enhancement of multi - agent capabilities**: In order to improve the performance of LLMs in multi - agent environments, the paper proposes the multi - agent tuning method to enhance the multi - agent capabilities of LLMs without compromising their general capabilities. 3. **Establishment of evaluation criteria**: In order to evaluate the effectiveness of multi - modal human communication simulation, the paper constructs the "Human - Communication Simulation Benchmark" and evaluates the performance of different systems through multiple indicators. Through these methods, the paper aims to address the deficiencies of existing LLMs multi - agent systems in multi - modal communication simulation and promote the development of related technologies.

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models

MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents

Human Simulacra: Benchmarking the Personification of Large Language Models

AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior

S3: Social-network Simulation System with Large Language Model-Empowered Agents

User Behavior Simulation with Large Language Model based Agents

A Multimodal Approach of Generating 3D Human-Like Talking Agent.

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

DialSim: A Real-Time Simulator for Evaluating Long-Term Multi-Party Dialogue Understanding of Conversational Agents

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents

Multi-Agent Large Language Models for Conversational Task-Solving

AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios

LLM Harmony: Multi-Agent Communication for Problem Solving

Agents: An Open-source Framework for Autonomous Language Agents

Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents