Abstract:Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.

Deep Reinforcement Learning with Temporal Logics

Directed Exploration in Reinforcement Learning from Linear Temporal Logic

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Modular Deep Reinforcement Learning with Temporal Logic Specifications

Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications

Certified Reinforcement Learning with Logic Guidance

DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications

Reinforcement learning under temporal logic constraints as a sequence modelling problem

Formal Policy Synthesis for Continuous-Space Systems via Reinforcement Learning

Reinforcement Learning with Temporal-Logic-Based Causal Diagrams

Modular Deep Reinforcement Learning for Continuous Motion Planning With Temporal Logic

Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

Deep Reinforcement Learning Under Signal Temporal Logic Constraints Using Lagrangian Relaxation

A Framework for Following Temporal Logic Instructions with Unknown Causal Dependencies

Eventual Discounting Temporal Logic Counterfactual Experience Replay

Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives

Deep Inductive Logic Programming meets Reinforcement Learning

Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration

Reinforcement Learning with Temporal Logic Constraints for Partially-Observable Markov Decision Processes

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents