Abstract:Recent advancements in natural language and Large Language Models (LLMs) have enabled AI agents to simulate human-like interactions within virtual worlds. However, these interactions still face limitations in complexity and flexibility, particularly in scenarios involving multiple characters and novel objects. Pre-defining all interactable objects in the agent's world model presents challenges, and conveying implicit intentions to multiple characters through complex interactions remains difficult. To address these issues, we propose integrating virtual Game Masters (GMs) into the agent's world model, drawing inspiration from Tabletop Role-Playing Games (TRPGs). GMs play a crucial role in overseeing information, estimating players' intentions, providing environment descriptions, and offering feedback, compensating for current world model deficiencies. To facilitate future explorations for complex interactions, we introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation (MOE) task and a supporting dataset. MOE challenges models to understand characters' intentions and accurately determine their actions within intricate contexts involving multi-character and novel object interactions. Besides, the dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations. Finally, we present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding. We hope that our dataset and task will inspire further research in complex interactions with natural language, fostering the development of more advanced AI agents.

What problem does this paper attempt to address?

The problem that this paper attempts to solve lies in the fact that current AI agents still have limitations in complexity and flexibility in their interactions in virtual worlds, especially in scenarios involving multiple characters and novel objects. Specifically, there are challenges in all the interactable objects in the predefined agent world model, and it is also very difficult to convey implicit intentions to multiple characters through complex interactions. To address these challenges, the author proposes integrating a virtual Game Master (GM) into the agent's world model, inspired by table - top role - playing games (TRPG). The role of the virtual GM is to supervise information, evaluate player intentions, provide environmental descriptions and feedback, thereby compensating for the deficiencies of the existing world model. The main contributions of the paper include: 1. Introducing a new benchmark named Tachikuma, which aims to promote the understanding of multi - character and novel - object interactions. This benchmark contains an interaction estimation task (MOE) based on multi - character and novel - object and a supporting data set. 2. Collecting a data set to address the limitations of long - term and complex multi - character interaction exploration in real - time communication. 3. Proposing a prompt baseline method and conducting a comprehensive evaluation of various prompt methods using different large - language models (LLMs). 4. Conducting a subjective evaluation, indicating that methods that perform better in the MOE task can generate more accurate, natural and well - founded responses, thereby enhancing the capabilities of agents. Through these contributions, the paper hopes to inspire the research community to gain a deeper understanding of complex interactions and promote the development of higher - level AI agents.

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

MindAgent: Emergent Gaming Interaction

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation

CGMI: Configurable General Multi-Agent Interaction Framework

Human Simulacra: Benchmarking the Personification of Large Language Models

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions

MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments

LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments

Theory of Mind for Multi-Agent Collaboration via Large Language Models

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

A Survey on Large Language Model-Based Game Agents

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator

Large Multimodal Agents: A Survey