Abstract:Present bias, the tendency to weigh costs and benefits incurred in the present too heavily, is one of the most widespread human behavioral biases. It has also been the subject of extensive study in the behavioral economics literature. While the simplest models assume that decision-making agents are naive, reasoning about the future without taking their bias into account, there is considerable evidence that people often behave in ways that are sophisticated with respect to present bias, making plans based on the belief that they will be present-biased in the future. For example, committing to a course of action to reduce future opportunities for procrastination or overconsumption are instances of sophisticated behavior in everyday life. Models of sophisticated behavior have lacked an underlying formalism that allows one to reason over the full space of multi-step tasks that a sophisticated agent might face, and this has made it correspondingly difficult to make comparative or worst-case statements about the performance of sophisticated agents in arbitrary scenarios. In this paper, we incorporate the framework of sophistication into a graph-theoretic model that we used in recent work for modeling naive agents. This new synthesis of two formalisms --- sophistication and graph-theoretic planning --- uncovers a rich structure that wasn't apparent in the earlier behavioral economics work on this problem, including a range of findings that shed new light on sophisticated behavior. In particular, our graph-theoretic model makes two kinds of new results possible. First, we give tight worst-case bounds on the performance of sophisticated agents in arbitrary multi-step tasks relative to the optimal plan, along with worst-case bounds for related questions. Second, the flexibility of our formalism makes it possible to identify new phenomena about sophisticated agents that had not been seen in prior literature: these include a surprising non-monotonic property in the use of rewards to motivate sophisticated agents; a sharp distinction in the performance of agents who overestimate versus underestimate their level of present bias; and a framework for reasoning about commitment devices that shows how certain classes of commitments can produce large gains for arbitrary tasks.

On shallow planning under partial observability

Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning

Covert Planning against Imperfect Observers

Risk-Averse Planning Under Uncertainty

On the Benefits of Leveraging Structural Information in Planning Over the Learned Model

Inverse Reinforcement Learning with Multiple Planning Horizons

The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning

A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings

Learning Abstract World Model for Value-preserving Planning with Options

Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning

Planning under periodic observations: bounds and bounding-based solutions

Planning with Multiple Biases

Planning with RL and episodic-memory behavioral priors

On the role of planning in model-based deep reinforcement learning

Planning Problems for Sophisticated Agents with Present Bias

Efficient Reinforcement Learning with Impaired Observability: Learning to Act with Delayed and Missing State Observations

On the Effective Horizon of Inverse Reinforcement Learning

Existence and Finiteness Conditions for Risk-Sensitive Planning: Results and Conjectures

A New View on Planning in Online Reinforcement Learning

Experiment Planning with Function Approximation

Iterative Option Discovery for Planning, by Planning