Abstract:The aim of inverse reinforcement learning (IRL) is to infer an agent's preferences from observing their behaviour. Usually, preferences are modelled as a reward function, $R$, and behaviour is modelled as a policy, $\pi$. One of the central difficulties in IRL is that multiple preferences may lead to the same observed behaviour. That is, $R$ is typically underdetermined by $\pi$, which means that $R$ is only partially identifiable. Recent work has characterised the extent of this partial identifiability for different types of agents, including optimal and Boltzmann-rational agents. However, work so far has only considered agents that discount future reward exponentially: this is a serious limitation, especially given that extensive work in the behavioural sciences suggests that humans are better modelled as discounting hyperbolically. In this work, we newly characterise partial identifiability in IRL for agents with non-exponential discounting: our results are in particular relevant for hyperbolical discounting, but they also more generally apply to agents that use other types of (non-exponential) discounting. We significantly show that generally IRL is unable to infer enough information about $R$ to identify the correct optimal policy, which entails that IRL alone can be insufficient to adequately characterise the preferences of such agents.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is an extended study of partial identifiability in inverse reinforcement learning (IRL) under non - exponential discounting. Specifically, the paper focuses on the difficulties and limitations of inferring the reward function from the agent's behavior when the agent adopts a non - exponential discount function. ### Background of the Paper The goal of inverse reinforcement learning (IRL) is to infer an agent's preferences from its behavior. Usually, the preferences are modeled as a reward function $R$, and the behavior is modeled as a policy $\pi$. A core challenge is that multiple different reward functions may lead to the same behavior, which means that the reward function $R$ is usually partially identifiable, that is, the reward function cannot be uniquely determined only by observing the behavior. Existing research mainly focuses on agents with exponential discount functions. However, behavioral science research shows that humans are more in line with the hyperbolic discounting model. Therefore, current research has limitations because it does not consider the case of non - exponential discounting. ### Research Questions This paper aims to explore the following questions: 1. **Characteristics of partial identifiability under non - exponential discounting**: How to describe and quantify the partial identifiability of the reward function in inverse reinforcement learning under non - exponential discounting? 2. **The impact of different discount functions on partial identifiability**: In particular, what is the impact of hyperbolic discounting and other forms of non - exponential discounting on the performance of inverse reinforcement learning algorithms? 3. **Limitations of existing IRL methods**: In the case of non - exponential discounting, can existing IRL methods accurately infer the correct optimal policy? ### Main Contributions The main contributions of this paper include: - **Introduction of new behavior models**: Three new behavior models are proposed, which are suitable for agents with general discount functions, namely resolute, naïve, and sophisticated policies. - **Exact characterization of partial identifiability**: The partial identifiability of the reward function under these new behavior models is studied, and exact characterizations and comparisons between models are provided. - **Theoretical results**: It is proved that in the case of non - exponential discounting, the IRL algorithm cannot infer enough information from the observed data to identify the correct optimal policy, which indicates that IRL alone may not be sufficient to fully characterize the preferences of such agents. ### Formulas and Definitions To ensure the correctness and readability of the formulas, the following are the key formulas and definitions involved in the paper: 1. **Hyperbolic discount function**: \[ d(t)=\frac{1}{1 + kt} \] where $k\in(0,\infty)$ is a parameter. 2. **Trajectory return function**: \[ G(\xi)=\sum_{t = 0}^{|\xi|}d(t)\cdot R(s_t,a_t,s_{t + 1}) \] 3. **Value function**: \[ V_\pi(\xi)=\mathbb{E}\left[\sum_{t = 0}^{\infty}d(t + n)\cdot R(\zeta_t)\right] \] where $\zeta$ is a trajectory formed by starting from the initial trajectory $\xi$ and sampling subsequent actions according to the policy $\pi$. 4. **Q - function**: \[ Q_\pi(s,a)=\mathbb{E}_{S'\sim\tau(s,a)}[R(s,a,S')+d(1)V_\pi(S')] \] 5. **Boltzmann resolute behavior model**: \[ P(\pi(\xi)=a)\propto\exp(\beta Q_R(s,|\xi|,a)) \] where $s$ is the last one of the trajectory $\xi$

Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting

Partial Identifiability and Misspecification in Inverse Reinforcement Learning

Inverse Reinforcement Learning with Unknown Reward Model based on Structural Risk Minimization

Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification

Towards Theoretical Understanding of Inverse Reinforcement Learning

Convergence Analysis of an Incremental Approach to Online Inverse Reinforcement Learning

Misspecification in Inverse Reinforcement Learning

Identifiability and Generalizability in Constrained Inverse Reinforcement Learning

Inverse Reinforcement Learning with Explicit Policy Estimates

On the Effective Horizon of Inverse Reinforcement Learning

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach

On Multi-Agent Inverse Reinforcement Learning

Modeling and Interpreting Real-world Human Risk Decision Making with Inverse Reinforcement Learning

Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

Inverse Reinforcement Learning with Sub-optimal Experts

Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms

IV-Posterior: Inverse Value Estimation for Interpretable Policy Certificates

Inverse Reinforcement Learning for Marketing

Multi-intention Inverse Q-learning for Interpretable Behavior Representation

Inverse Reinforcement Learning with Multiple Planning Horizons