Polycraft World AI Lab (PAL): An Extensible Platform for Evaluating Artificial Intelligence Agents

Stephen A. Goss,Robert J. Steininger,Dhruv Narayanan,Daniel V. Olivença,Yutong Sun,Peng Qiu,Jim Amato,Eberhard O. Voit,Walter E. Voit,Eric J. Kildebeck

DOI: https://doi.org/10.48550/arXiv.2301.11891

2023-01-28

Abstract:As artificial intelligence research advances, the platforms used to evaluate AI agents need to adapt and grow to continue to challenge them. We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World. Our platform is built to allow AI agents with different architectures to easily interact with the Minecraft world, train and be evaluated in multiple tasks. PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation. All actions taken by AI agents and external actors (non-player-characters, NPCs) in the open-world environment are logged to streamline evaluation. Here we present two custom tasks on the PAL platform, one focused on multi-step planning and one focused on navigation, and evaluations of agents solving them. In summary, we report a versatile and extensible AI evaluation platform with a low barrier to entry for AI researchers to utilize.

Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of current platforms for evaluating artificial intelligence agents (AI agents) in terms of flexibility, scalability, and adaptability. Specifically, many existing platforms are either limited to specific tasks or sets of tasks, or are tested in a closed environment, which are not sufficient to comprehensively evaluate and promote the development of a new generation of AI agents for open - world learning, transfer learning, and lifelong learning. To overcome these problems, the paper introduces the Polycraft World AI Lab (PAL), a simulation platform based on the Minecraft game, aiming to provide a more flexible, scalable, and user - friendly environment for evaluating the performance of AI agents with different architectures in multiple tasks. The PAL platform allows researchers to design more complex and realistic task scenarios by providing a rich task - creation system and a high degree of customization ability, thereby better simulating real - world challenges. In addition, PAL supports interaction with various external AI agents through its API interface, lowering the barrier for researchers to use the platform, and simplifies the evaluation process of AI agent behavior through its comprehensive logging function. The paper presents two specific task examples - the POGO task (a multi - step planning task) and the HUGA task (a navigation task) to demonstrate the functions and potential of the PAL platform. These tasks can not only evaluate the basic capabilities of AI agents, but also increase the task complexity and introduce external characters (such as NPCs) to improve the task's real - world relevance.

Polycraft World AI Lab (PAL): An Extensible Platform for Evaluating Artificial Intelligence Agents

Minecraft as an Experimental World for AI in Robotics

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft

Odyssey: Empowering Minecraft Agents with Open-World Skills

APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents

Creative Agents: Empowering Agents with Imagination for Creative Tasks

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents.

STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

ParliRobo: Participant Lightweight AI Robots for Massively Multiplayer Online Games (Mmogs)

The Arcade Learning Environment: An Evaluation Platform for General Agents

Towards Evaluating Generalist Agents: An Automated Benchmark in Open World

WebArena: A Realistic Web Environment for Building Autonomous Agents

Scaling Instructable Agents Across Many Simulated Worlds

See and Think: Embodied Agent in Virtual Environment

Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks

IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents

InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent

Project Sid: Many-agent simulations toward AI civilization