Abstract:Despite tremendous progress over the past decade, deep learning methods generally fall short of human-level systematic generalization. It has been argued that explicitly capturing the underlying structure of data should allow connectionist systems to generalize in a more predictable and systematic manner. Indeed, evidence in humans suggests that interpreting the world in terms of symbol-like compositional entities may be crucial for intelligent behavior and high-level reasoning. Another common limitation of deep learning systems is that they require large amounts of training data, which can be expensive to obtain. In representation learning, large datasets are leveraged to learn generic data representations that may be useful for efficient learning of arbitrary downstream tasks. This thesis is about structured representation learning. We study methods that learn, with little or no supervision, representations of unstructured data that capture its hidden structure. In the first part of the thesis, we focus on representations that disentangle the explanatory factors of variation of the data. We scale up disentangled representation learning to a novel robotic dataset, and perform a systematic large-scale study on the role of pretrained representations for out-of-distribution generalization in downstream robotic tasks. The second part of this thesis focuses on object-centric representations, which capture the compositional structure of the input in terms of symbol-like entities, such as objects in visual scenes. Object-centric learning methods learn to form meaningful entities from unstructured input, enabling symbolic information processing on a connectionist substrate. In this study, we train a selection of methods on several common datasets, and investigate their usefulness for downstream tasks and their ability to generalize out of distribution.

The Role of Pretrained Representations for the OOD Generalization of Reinforcement Learning Agents

On the Generalization of Learned Structured Representations

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors

Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness

Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning

Generalization and Regularization in DQN

Pre-training with Non-expert Human Demonstration for Deep Reinforcement Learning

GRAM: Generalization in Deep RL with a Robust Adaptation Module

Become a Proficient Player with Limited Data through Watching Pure Videos

A Survey Analyzing Generalization in Deep Reinforcement Learning

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining

Improving generalization in reinforcement learning through forked agents

Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Zero-Shot Generalization of Vision-Based RL Without Data Augmentation

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning

Generalizing in the Real World with Representation Learning

Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning