Project Sid: Many-agent simulations toward AI civilization

Altera.AL,Andrew Ahn,Nic Becker,Stephanie Carroll,Nico Christie,Manuel Cortes,Arda Demirci,Melissa Du,Frankie Li,Shuying Luo,Peter Y Wang,Mathew Willows,Feitong Yang,Guangyu Robert Yang
2024-11-01
Abstract:AI agents have been evaluated in isolation or within small groups, where interactions remain limited in scope and complexity. Large-scale simulations involving many autonomous agents -- reflecting the full spectrum of civilizational processes -- have yet to be explored. Here, we demonstrate how 10 - 1000+ AI agents behave and progress within agent societies. We first introduce the PIANO (Parallel Information Aggregation via Neural Orchestration) architecture, which enables agents to interact with humans and other agents in real-time while maintaining coherence across multiple output streams. We then evaluate agent performance in agent simulations using civilizational benchmarks inspired by human history. These simulations, set within a Minecraft environment, reveal that agents are capable of meaningful progress -- autonomously developing specialized roles, adhering to and changing collective rules, and engaging in cultural and religious transmission. These preliminary results show that agents can achieve significant milestones towards AI civilizations, opening new avenues for large simulations, agentic organizational intelligence, and integrating AI into human civilizations.
Artificial Intelligence,Multiagent Systems
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems in large - scale multi - agent (AI agents) simulations, especially in constructing and evaluating AI civilizations that can coexist and progress with human civilizations. Specifically, the paper mainly focuses on the following problems: 1. **Individual agents cannot make progress over a long period**: - Agents driven by LLM (Large Language Model) often fail to make continuous progress due to hallucinations or getting trapped in repetitive behavior patterns. Even when equipped with planning and reflection modules, agents may still be unable to achieve meaningful progress due to the accumulation of errors in environmental interactions. 2. **Groups of multiple agents cannot progress collaboratively**: - Poor communication between agents can lead to misunderstandings and further spread of hallucinations, thus hindering the overall progress of the group. In addition, maintaining coherence when concurrently executing multiple output streams is also a challenge. For example, an agent may say one thing but do another, which will disrupt the group's coordination. 3. **Lack of benchmarks for measuring the progress of civilizations**: - Current agent benchmarks mainly focus on single tasks or small - scale scenarios, such as web search, coding, reasoning, etc., and there are no benchmarks for evaluating the progress of large - scale multi - agent systems on a civilization scale. Most of the existing multi - agent behavior benchmarks are also limited to small - group scenarios and cannot capture the progress of a large number of agents at the civilization level. To solve these problems, the paper makes the following contributions: - **PIANO architecture**: A new agent architecture that enables agents to interact with humans and other agents in real - time environments while maintaining consistency among multiple output streams through Parallel Information Aggregation via Neural Orchestration. - **Improve the progress of individual agents**: By introducing modules such as action awareness, agents can better understand the real - environment, reduce hallucinations and improve the ability to complete individual tasks. - **Improve the dynamics of multi - agents**: Through the social awareness module, agents can understand the emotions and intentions of others, thus promoting cooperation and trust and adapting to competition and conflict in the social environment. - **Benchmarks for civilization progress**: Propose a series of new evaluation indicators to measure the progress of agents in specialization, collective rules and social - cultural dissemination in large - scale simulations. These improvements and benchmarks are proposed to promote the development of agents in the simulation environment similar to that of human civilizations, thereby providing theoretical and technical support for the future integration of AI and human society.