MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

Xianhao Yu,Jiaqi Fu,Renjia Deng,Wenjuan Han
2024-05-23
Abstract:While Vision-Language Models (VLMs) hold promise for tasks requiring extensive collaboration, traditional multi-agent simulators have facilitated rich explorations of an interactive artificial society that reflects collective behavior. However, these existing simulators face significant limitations. Firstly, they struggle with handling large numbers of agents due to high resource demands. Secondly, they often assume agents possess perfect information and limitless capabilities, hindering the ecological validity of simulated social interactions. To bridge this gap, we propose a multi-agent Minecraft simulator, MineLand, that bridges this gap by introducing three key features: large-scale scalability, limited multimodal senses, and physical needs. Our simulator supports 64 or more agents. Agents have limited visual, auditory, and environmental awareness, forcing them to actively communicate and collaborate to fulfill physical needs like food and resources. Additionally, we further introduce an AI agent framework, Alex, inspired by multitasking theory, enabling agents to handle intricate coordination and scheduling. Our experiments demonstrate that the simulator, the corresponding benchmark, and the AI agent framework contribute to more ecological and nuanced collective behavior.The source code of MineLand and Alex is openly available at
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the limitations of existing multi - agent open - world simulators in handling large - scale agent scenarios, assuming agents have perfect information and infinite capabilities by proposing a multi - agent Minecraft simulator named MineLand. These issues lead to insufficient ecological validity in simulating social interactions, that is, there are significant differences between interactions in the simulated environment and those of humans in the real world. To bridge this gap, MineLand introduces three key features: large - scale scalability, limited multi - modal perception capabilities, and physiological needs. The introduction of these features aims to enable the simulator to support a larger number of agents while more realistically reflecting real - world social interactions. Specifically, MineLand addresses the above problems in the following ways: 1. **Large - scale Scalability**: By optimizing the performance overhead of each Minecraft client, MineLand can support 64 or more agents on mainstream consumer - level desktop computers, while traditional simulators can usually only support 2 agents. 2. **Limited Multi - modal Perception Capabilities**: Agents in the simulator have a partially observable environment, an egocentric perspective, and limited visual and auditory perception capabilities. This design mimics the influence of factors such as distance, terrain, and environment on visibility and hearing in real life, restricting information acquisition and forcing agents to actively communicate to compensate for sensory deficiencies. 3. **Physiological Needs**: Agents need to meet basic physiological needs, such as food, survival, and resource management, which adds daily routines in the time dimension. This setting requires cooperation and competition among agents to obtain resources, reflecting the complex interaction between cooperation and self - interest in human society. Through these improvements, MineLand not only improves the ecological validity of multi - agent simulation but also provides a rich platform for evaluating multi - agent capabilities based on large - language models (LLMs) or multi - modal language models (VLMs). In addition, the paper also proposes an AI agent framework named Alex, which is inspired by the multi - task theory in the cognitive field and can perform complex coordination and scheduling tasks simultaneously, further enhancing the capabilities of agents.