Abstract:In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures among multiple MDPs has been shown to yield significant benefits to the sample efficiency compared to single-task RL. In this paper, we investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs). The main challenge here is that the large and complex model space makes it hard to identify what types of common latent structure of multi-task PSRs can reduce the model complexity and improve sample efficiency. To this end, we posit a joint model class for tasks and use the notion of $\eta$-bracketing number to quantify its complexity; this number also serves as a general metric to capture the similarity of tasks and thus determines the benefit of multi-task over single-task RL. We first study upstream multi-task learning over PSRs, in which all tasks share the same observation and action spaces. We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSRs has a smaller $\eta$-bracketing number compared to that of individual single-task learning. We also provide several example multi-task PSRs with small $\eta$-bracketing numbers, which reap the benefits of multi-task learning. We further investigate downstream learning, in which the agent needs to learn a new target task that shares some commonalities with the upstream tasks via a similarity constraint. By exploiting the learned PSRs from the upstream, we develop a sample-efficient algorithm that provably finds a near-optimal policy.

Multi-Task Reinforcement Learning with Cost-based HTN Planning

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

FLTRNN: Faithful Long-Horizon Task Planning for Robotics with Large Language Models

Retrieval-Augmented Hierarchical in-Context Reinforcement Learning and Hindsight Modular Reflections for Task Planning with LLMs

Multi-Task Reinforcement Learning with Soft Modularization.

Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction

The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback

Multi-Task Long-Range Urban Driving Based on Hierarchical Planning and Reinforcement Learning

Integrating Task-Motion Planning with Reinforcement Learning for Robust Decision Making in Mobile Robots

Multi-Task Multi-Agent Reinforcement Learning With Interaction and Task Representations

Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Human-robot collaborative assembly task planning for mobile cobots based on deep reinforcement learning

Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

Interactive Task Planning with Language Models

Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning

Deep Reinforcement Learning-based Task Assignment and Path Planning for Multi-agent Construction Robots

Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes

LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

Efficient Multi-agent Reinforcement Learning by Planning

MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model