MaestroMotif: Skill Design from Artificial Intelligence Feedback

Martin Klissarov,Mikael Henaff,Roberta Raileanu,Shagun Sodhani,Pascal Vincent,Amy Zhang,Pierre-Luc Bacon,Doina Precup,Marlos C. Machado,Pierluca D'Oro
2024-12-12
Abstract:Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to inject human knowledge into AI systems through natural - language - described skills to achieve highly efficient and adaptable agents. Specifically, the authors propose a method named MaestroMotif for assisting in the design of AI skills. This method utilizes the capabilities of large - language models (LLMs) to effectively create and reuse skills. #### Main problem background: 1. **Complexity of low - level skill design**: - Existing frameworks require a high level of human - expert participation, including collecting data for specific skills, developing heuristic algorithms, or manually handling reward engineering. This not only requires professional knowledge but also consumes a great deal of time and effort, limiting the applicability and generality of these methods. 2. **Challenges of natural - language - described skills**: - Although humans can easily describe skills through natural language, transforming these descriptions into forms that AI systems can understand remains a major challenge. Traditional LLM - based systems usually require humans to solve skill - design problems on their own, that is, manually writing strategies for LLMs to use. #### MaestroMotif's solution: MaestroMotif introduces a new paradigm - AI - Assisted Skill Design. In this paradigm, humans provide skills described in natural language, and AI assistants automatically convert these descriptions into usable low - level strategies. The specific steps are as follows: 1. **Automated skill - reward design**: - Using LLM feedback, starting from natural - language descriptions, automatically generate reward functions corresponding to each skill. This process is achieved through the Motif method, that is, by having the LLM label preferences in the interaction dataset and distilling these preferences into specific reward functions. 2. **Generate skill initiation/termination functions**: - Use the LLM to generate code to define the initiation and termination functions of each skill. These functions determine when a skill can be activated and when it should be terminated. 3. **Generate training - time skill strategies**: - According to task requirements, the LLM generates a training - time skill strategy πT, which determines which skill to activate in different states. This helps to learn a state distribution closer to the actual deployment environment and avoid redundancy. 4. **Train skills through reinforcement learning**: - Combine the above components and train the skill strategy πωi through reinforcement learning to maximize the corresponding reward function rφi. When the termination condition of a skill is met, select the next skill to continue execution. 5. **Zero - sample control**: - In the deployment phase, users can specify tasks in natural language, and MaestroMotif generates code through the LLM to combine existing skills to immediately achieve the required behavior without additional training. ### Summary By combining the natural - language - understanding and code - generation capabilities of LLMs, MaestroMotif significantly simplifies the skill - design process, enabling AI systems to more efficiently utilize high - level information provided by humans and solving the problems of relying on manual design and excessive complexity in traditional methods.