Abstract:Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent. This is because partial observability gives rise to hidden confounders in the causal graph. In previous work, to work around the confounding problem, policies have been trained using query access to the expert's policy or inverse reinforcement learning (IRL). However, both approaches have drawbacks as the expert's policy may not be available and IRL can be unstable in practice. Instead, we propose to train a variational inference model to infer the expert's latent information and use it to train a latent-conditional policy. We prove that using this method, under strong assumptions, the identification of the correct imitation learning policy is theoretically possible from expert demonstrations alone. In practice, we focus on a setting with less strong assumptions where we use exploration data for learning the inference model. We show in theory and practice that this algorithm converges to the correct interventional policy, solves the confounding issue, and can under certain assumptions achieve an asymptotically optimal imitation performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in imitation learning, when the expert demonstrator has more sensory input information than the imitator, standard imitation learning methods may fail. Specifically, due to partial observability, there are hidden confounders in the causal graph, which makes it difficult for the imitator to accurately imitate the expert's behavior. For example, in the autonomous driving scenario, a human driver can adjust the driving speed according to the weather forecast, while an autonomous vehicle may not be able to obtain the same weather forecast information, and thus may not be able to react correctly when facing an icy road. To solve this problem, the paper proposes a method based on variational inference. By training a variational inference model to infer the expert's latent information and using this information to train a policy that depends on the latent condition. The paper proves that under strong assumptions, the correct imitation - learning policy can be identified solely from expert demonstrations. In practical applications, the researchers focus on a setting with weaker assumptions, in which exploration data is used to learn the inference model. The paper shows that this algorithm can not only converge to the correct intervention policy and solve the confounding problem, but also achieve asymptotically optimal imitation performance under certain assumptions. ### Main contributions of the paper: 1. **Proposed a practical method based on variational inference**: This method can solve the confounding imitation - learning problem without expert queries. Experiments have proven that this method outperforms traditional behavior cloning (BC) and inverse reinforcement learning (IRL) methods in various confounding control problems. 2. **Proposed a theoretical method**: This method can solve the confounding imitation - learning problem purely theoretically from offline data, without the need for expert queries or exploration. 3. **Provided theoretical insights**: Under strong assumptions, it is proved that the expert's policy can be identified through this method. ### Key technical details: - **Variational inference model**: Used to learn the distribution of latent variables from exploration data. - **Latent - condition policy**: Depends on the entire history of interaction information rather than the current observation, so it can infer the value of latent variables. - **Intervention policy**: By treating past actions as interventions, the problem of causal illusions is avoided. ### Experimental verification: The paper verifies the effectiveness of this method in a series of confounding imitation - learning problems in high - dimensional observation and action spaces, demonstrating its potential in practical applications. In conclusion, this paper provides a new method to solve the confounding problem in imitation learning, especially in cases where there is information asymmetry between the expert and the imitator. This method not only has strong theoretical guarantees but also performs well in practice.

Deconfounding Imitation Learning with Variational Inference

Off-Dynamics Inverse Reinforcement Learning

Adversarial Imitation via Variational Inverse Reinforcement Learning

Initial State Interventions for Deconfounded Imitation Learning

Offline Imitation Learning with Variational Counterfactual Reasoning

Regularizing Adversarial Imitation Learning Using Causal Invariance

A Bayesian Solution To The Imitation Gap

Invariant Causal Imitation Learning for Generalizable Policies

Adversarial Imitation Learning from Visual Observations using Latent Information

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

Offline Causal Imitation Learning with Latent Confounders

Visual Adversarial Imitation Learning using Variational Models

Learning Discrete State Abstractions With Deep Variational Inference

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Sequential Causal Imitation Learning with Unobserved Confounders

Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms

IV-Posterior: Inverse Value Estimation for Interpretable Policy Certificates

Learning Causally Invariant Reward Functions from Diverse Demonstrations

Domain-Robust Visual Imitation Learning with Mutual Information Constraints

Efficient Imitation Learning with Conservative World Models