Abstract:Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI field, which aims to learn an agent to cooperate with an unseen partner in training environments or even novel environments. In recent years, a popular ZSC solution paradigm has been deep reinforcement learning (DRL) combined with advanced self-play or population-based methods to enhance the neural policy's ability to handle unseen partners. Despite some success, these approaches usually rely on black-box neural networks as the policy function. However, neural networks typically lack interpretability and logic, making the learned policies difficult for partners (e.g., humans) to understand and limiting their generalization ability. These shortcomings hinder the application of reinforcement learning methods in diverse cooperative scenarios.We suggest to represent the agent's policy with an interpretable program. Unlike neural networks, programs contain stable logic, but they are non-differentiable and difficult to <a class="link-external link-http" href="http://optimize.To" rel="external noopener nofollow">this http URL</a> automatically learn such programs, we introduce Knowledge-driven Programmatic reinforcement learning for zero-shot Coordination (KnowPC). We first define a foundational Domain-Specific Language (DSL), including program structures, conditional primitives, and action primitives. A significant challenge is the vast program search space, making it difficult to find high-performing programs efficiently. To address this, KnowPC integrates an extractor and an reasoner. The extractor discovers environmental transition knowledge from multi-agent interaction trajectories, while the reasoner deduces the preconditions of each action primitive based on the transition knowledge.

Zero-shot policy generation in lifelong reinforcement learning

Automatic Curriculum Generation for Reinforcement Learning in Zero-Sum Games

Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation

Zero-shot Policy Learning with Spatial Temporal Reward Decomposition on Contingency-aware Observation.

KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination

Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning.

Synthesizing Programmatic Policy for Generalization Within Task Domain

Zero-Shot Compositional Policy Learning via Language Grounding

RL Zero: Zero-Shot Language to Behaviors without any Supervision

Explore to Generalize in Zero-Shot RL

Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning

Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

Statistical Guarantees for Lifelong Reinforcement Learning using PAC-Bayesian Theory

RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

Efficient Deep Reinforcement Learning Through Policy Transfer.

Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation

Federated reinforcement learning for robot motion planning with zero-shot generalization

ZSL-RPPO: Zero-Shot Learning for Quadrupedal Locomotion in Challenging Terrains using Recurrent Proximal Policy Optimization