Abstract:Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of how to inject human knowledge into AI systems through natural - language - described skills to achieve highly efficient and adaptable agents. Specifically, the authors propose a method named MaestroMotif for assisting in the design of AI skills. This method utilizes the capabilities of large - language models (LLMs) to effectively create and reuse skills. #### Main problem background: 1. **Complexity of low - level skill design**: - Existing frameworks require a high level of human - expert participation, including collecting data for specific skills, developing heuristic algorithms, or manually handling reward engineering. This not only requires professional knowledge but also consumes a great deal of time and effort, limiting the applicability and generality of these methods. 2. **Challenges of natural - language - described skills**: - Although humans can easily describe skills through natural language, transforming these descriptions into forms that AI systems can understand remains a major challenge. Traditional LLM - based systems usually require humans to solve skill - design problems on their own, that is, manually writing strategies for LLMs to use. #### MaestroMotif's solution: MaestroMotif introduces a new paradigm - AI - Assisted Skill Design. In this paradigm, humans provide skills described in natural language, and AI assistants automatically convert these descriptions into usable low - level strategies. The specific steps are as follows: 1. **Automated skill - reward design**: - Using LLM feedback, starting from natural - language descriptions, automatically generate reward functions corresponding to each skill. This process is achieved through the Motif method, that is, by having the LLM label preferences in the interaction dataset and distilling these preferences into specific reward functions. 2. **Generate skill initiation/termination functions**: - Use the LLM to generate code to define the initiation and termination functions of each skill. These functions determine when a skill can be activated and when it should be terminated. 3. **Generate training - time skill strategies**: - According to task requirements, the LLM generates a training - time skill strategy πT, which determines which skill to activate in different states. This helps to learn a state distribution closer to the actual deployment environment and avoid redundancy. 4. **Train skills through reinforcement learning**: - Combine the above components and train the skill strategy πωi through reinforcement learning to maximize the corresponding reward function rφi. When the termination condition of a skill is met, select the next skill to continue execution. 5. **Zero - sample control**: - In the deployment phase, users can specify tasks in natural language, and MaestroMotif generates code through the LLM to combine existing skills to immediately achieve the required behavior without additional training. ### Summary By combining the natural - language - understanding and code - generation capabilities of LLMs, MaestroMotif significantly simplifies the skill - design process, enabling AI systems to more efficiently utilize high - level information provided by humans and solving the problems of relying on manual design and excessive complexity in traditional methods.

MaestroMotif: Skill Design from Artificial Intelligence Feedback

Motif: Intrinsic Motivation from Artificial Intelligence Feedback

Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

Agentic Skill Discovery

Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning

Language to Rewards for Robotic Skill Synthesis

Choreographer: Learning and Adapting Skills in Imagination

Skill Machines: Temporal Logic Skill Composition in Reinforcement Learning

Maestro: A Gamified Platform for Teaching AI Robustness

Augmenting Autotelic Agents with Large Language Models

Winning Is Not Everything: Enhancing Game Development With Intelligent Agents

Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery

Skill-Based Reinforcement Learning with Intrinsic Reward Matching

ChatPCG: Large Language Model-Driven Reward Design for Procedural Content Generation

You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback

Curriculum-Based Imitation of Versatile Skills

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration