Abstract:Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy. It also provides an interpretable encoding of high-level environmental and task features, so a human can readily verify that the agent has learnt coherent tasks with no misspecifications. In addition, we take steps towards ensuring that the learnt automaton is environment-agnostic, making it well-suited for use in transfer learning. Finally, we provide experimental results compared with two baselines to illustrate our algorithm's performance in different environments and tasks.

Automata Guided Reinforcement Learning With Demonstrations

Hierarchical Temporal Logic Guided Reinforcement Learning

Reinforcement learning with temporal logic rewards

Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation

Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications

Deep Reinforcement Learning with Temporal Logics

Automata Guided Semi-Decentralized Multi-Agent Reinforcement Learning

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments from Temporal Logic Specifications

Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks

A Framework for Following Temporal Logic Instructions with Unknown Causal Dependencies

Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations

Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping

Directed Exploration in Reinforcement Learning from Linear Temporal Logic

Learning Task Specifications from Demonstrations as Probabilistic Automata

Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning

Temporal Logic Guided Safe Reinforcement Learning Using Control Barrier Functions

Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration

Learning Task Automata for Reinforcement Learning using Hidden Markov Models

Overcoming Exploration in Reinforcement Learning with Demonstrations

GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback