Abstract:Recent advancements in natural language and Large Language Models (LLMs) have enabled AI agents to simulate human-like interactions within virtual worlds. However, these interactions still face limitations in complexity and flexibility, particularly in scenarios involving multiple characters and novel objects. Pre-defining all interactable objects in the agent's world model presents challenges, and conveying implicit intentions to multiple characters through complex interactions remains difficult. To address these issues, we propose integrating virtual Game Masters (GMs) into the agent's world model, drawing inspiration from Tabletop Role-Playing Games (TRPGs). GMs play a crucial role in overseeing information, estimating players' intentions, providing environment descriptions, and offering feedback, compensating for current world model deficiencies. To facilitate future explorations for complex interactions, we introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation (MOE) task and a supporting dataset. MOE challenges models to understand characters' intentions and accurately determine their actions within intricate contexts involving multi-character and novel object interactions. Besides, the dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations. Finally, we present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding. We hope that our dataset and task will inspire further research in complex interactions with natural language, fostering the development of more advanced AI agents.

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Learning through Dialogue Interactions by Asking Questions

Dialogue Learning with Human-in-the-Loop.

Uman-in-thel oop

Learning to Speak and Act in a Fantasy Text Adventure Game

Language Urban Odyssey: A Serious Game for Enhancing Second Language Acquisition Through Large Language Models

Situated Dialogue Learning through Procedural Environment Generation

Mastering emergent language: learning to guide in simulated navigation

How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds

Interactive Grounded Language Acquisition and Generalization in a 2D World

Enhancing Agent Learning through World Dynamics Modeling

Ambient Adventures: Teaching ChatGPT on Developing Complex Stories

Learning to Model the World with Language

Toward Co-creative Dungeon Generation via Transfer Learning

Solving Dialogue Grounding Embodied Task in a Simulated Environment using Further Masked Language Modeling

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

Game On: Towards Language Models as RL Experimenters

Learning to Win by Reading Manuals in a Monte-Carlo Framework

Grounding Language with Visual Affordances over Unstructured Data

CALYPSO: LLMs as Dungeon Masters' Assistants

Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents