Polycraft World AI Lab (PAL): An Extensible Platform for Evaluating Artificial Intelligence Agents

Stephen A. Goss,Robert J. Steininger,Dhruv Narayanan,Daniel V. Olivença,Yutong Sun,Peng Qiu,Jim Amato,Eberhard O. Voit,Walter E. Voit,Eric J. Kildebeck
DOI: https://doi.org/10.48550/arXiv.2301.11891
2023-01-28
Abstract:As artificial intelligence research advances, the platforms used to evaluate AI agents need to adapt and grow to continue to challenge them. We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World. Our platform is built to allow AI agents with different architectures to easily interact with the Minecraft world, train and be evaluated in multiple tasks. PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation. All actions taken by AI agents and external actors (non-player-characters, NPCs) in the open-world environment are logged to streamline evaluation. Here we present two custom tasks on the PAL platform, one focused on multi-step planning and one focused on navigation, and evaluations of agents solving them. In summary, we report a versatile and extensible AI evaluation platform with a low barrier to entry for AI researchers to utilize.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of current platforms for evaluating artificial intelligence agents (AI agents) in terms of flexibility, scalability, and adaptability. Specifically, many existing platforms are either limited to specific tasks or sets of tasks, or are tested in a closed environment, which are not sufficient to comprehensively evaluate and promote the development of a new generation of AI agents for open - world learning, transfer learning, and lifelong learning. To overcome these problems, the paper introduces the Polycraft World AI Lab (PAL), a simulation platform based on the Minecraft game, aiming to provide a more flexible, scalable, and user - friendly environment for evaluating the performance of AI agents with different architectures in multiple tasks. The PAL platform allows researchers to design more complex and realistic task scenarios by providing a rich task - creation system and a high degree of customization ability, thereby better simulating real - world challenges. In addition, PAL supports interaction with various external AI agents through its API interface, lowering the barrier for researchers to use the platform, and simplifies the evaluation process of AI agent behavior through its comprehensive logging function. The paper presents two specific task examples - the POGO task (a multi - step planning task) and the HUGA task (a navigation task) to demonstrate the functions and potential of the PAL platform. These tasks can not only evaluate the basic capabilities of AI agents, but also increase the task complexity and introduce external characters (such as NPCs) to improve the task's real - world relevance.