Abstract:Humans can perform complex tasks with long-term objectives by planning, reasoning, and forecasting outcomes of actions. For embodied agents to achieve similar capabilities, they must gain knowledge of the environment transferable to novel scenarios with a limited budget of additional trial and error. Learning-based approaches, such as deep RL, can discover and take advantage of inherent regularities and characteristics of the application domain from data, and continuously improve their performances, however at a cost of large amounts of training data. This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks, focusing on enhancing learning efficiency, interpretability, and transferability across novel scenarios. Four key contributions are made. 1) CALVIN, a differential planner that learns interpretable models of the world for long-term planning. It successfully navigated partially observable 3D environments, such as mazes and indoor rooms, by learning the rewards and state transitions from expert demonstrations. 2) SOAP, an RL algorithm that discovers options unsupervised for long-horizon tasks. Options segment a task into subtasks and enable consistent execution of the subtask. SOAP showed robust performances on history-conditional corridor tasks as well as classical benchmarks such as Atari. 3) LangProp, a code optimisation framework using LLMs to solve embodied agent problems that require reasoning by treating code as learnable policies. The framework successfully generated interpretable code with comparable or superior performance to human-written experts in the CARLA autonomous driving benchmark. 4) Voggite, an embodied agent with a vision-to-action transformer backend that solves complex tasks in Minecraft. It achieved third place in the MineRL BASALT Competition by identifying action triggers to segment tasks into multiple stages.

Multi-State-Space Reasoning Reinforcement Learning for Long-Horizon RFID-Based Robotic Searching and Planning Tasks

RIRL: A Recurrent Imitation and Reinforcement Learning Method for Long-Horizon Robotic Tasks

S2RL: Do We Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

S2rl

S2RL: DoWe Really Need to Perceive All States in Deep Multi-Agent Reinforcement Learning?

Retrieval-Augmented Hierarchical in-Context Reinforcement Learning and Hindsight Modular Reflections for Task Planning with LLMs

Multi-Task Long-Range Urban Driving Based on Hierarchical Planning and Reinforcement Learning

Hybrid Information-driven Multi-agent Reinforcement Learning

Robot Representation and Reasoning with Knowledge from Reinforcement Learning

Robotic Search & Rescue via Online Multi-task Reinforcement Learning

LIRL: Latent Imagination-Based Reinforcement Learning for Efficient Coverage Path Planning

Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer

Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

Multi-robot Social-aware Cooperative Planning in Pedestrian Environments Using Multi-agent Reinforcement Learning

Efficient Reinforcement Learning of Task Planners for Robotic Palletization through Iterative Action Masking Learning

SRLM: Human-in-Loop Interactive Social Robot Navigation with Large Language Model and Deep Reinforcement Learning

Spatial Reasoning and Planning for Deep Embodied Agents

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

SRL-VIC: A Variable Stiffness-Based Safe Reinforcement Learning for Contact-Rich Robotic Tasks

RePLan: Robotic Replanning with Perception and Language Models

Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system