Abstract: Recently, incorporating natural language instructions into reinforcement learning (RL) to learn semantically meaningful representations and foster generalization has caught many concerns. However, the semantical information in language instructions is usually entangled with task-specific state information, which hampers the learning of semantically invariant and reusable representations. In this paper, we propose a method to learn such representations called element randomization, which extracts task-relevant but environment-agnostic semantics from instructions using a set of environments with randomized elements, e.g., topological structures or textures, yet the same language instruction. We theoretically prove the feasibility of learning semantically invariant representations through randomization. In practice, we accordingly develop a hierarchy of policies, where a high-level policy is designed to modulate the behavior of a goal-conditioned low-level policy by proposing subgoals as semantically invariant representations. Experiments on challenging long-horizon tasks show that (1) our low-level policy reliably generalizes to tasks against environment changes; (2) our hierarchical policy exhibits extensible generalization in unseen new tasks that can be decomposed into several solvable sub-tasks; and (3) by storing and replaying language trajectories as succinct policy representations, the agent can complete tasks in a one-shot fashion, i.e., once one successful trajectory has been attained.

Learning Actionable Representations with Goal-Conditioned Policies

Learning Hierarchical Graph-Based Policy for Goal-Reaching in Unknown Environments

Learning Efficient Representations for Goal-conditioned Reinforcement Learning Via Tabu Search

Learning Action Representations for Reinforcement Learning

Representation-Driven Reinforcement Learning

Representation learning for continuous action spaces is beneficial for efficient policy learning

Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

Active Hierarchical Exploration with Stable Subgoal Representation Learning

Learning Subgoal Representations with Slow Dynamics

Backward Learning for Goal-Conditioned Policies

Learning Action-based Representations Using Invariance

Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

Goal Recognition as Reinforcement Learning

Learning to Represent Action Values as a Hypergraph on the Action Vertices

Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance

Learning Invariable Semantical Representation from Language for Extensible Policy Generalization

Unsupervised State Representation Learning in Atari

Unsupervised Representation Learning in Partially Observable Atari Games

Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning

Learning Intuitive Policies Using Action Features

Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis