Abstract:Reward rate maximization is a prominent normative principle commonly held in behavioral ecology, neuroscience, economics, and artificial intelligence. Here, we identify and compare equations for evaluating the worth of initiating pursuits that an agent could implement to enable reward-rate maximization. We identify two fundamental temporal decision-making categories requiring the valuation of the initiation of a pursuit - forgo and choice decision-making - over which we generalize and analyze the optimal solution for how to evaluate a pursuit in order to maximize reward rate. From this reward rate maximizing formulation, we derive expressions for the subjective value of a pursuit, i.e. that pursuit’s equivalent immediate reward magnitude, and reveal that time’s cost is composed of an , in addition to, an cost. By re-expressing subjective value as a temporal discounting function, we show precisely how the temporal discounting function of a reward rate optimal agent is sensitive not just to the properties of a considered pursuit, but to the time spent and reward acquired outside of the pursuit for every instance spent within it. In doing so, we demonstrate how the apparent discounting function of a reward-rate optimizing agent depends on the temporal structure of the environment and is a combination of hyperbolic and linear components, whose contributions relate the apportionment and opportunity cost of time, respectively. We further then show how purported signs of suboptimal behavior (hyperbolic discounting, the “Magnitude” effect, the “Sign” effect) are in fact consistent with reward rate maximization. In clarifying what features are, and are not signs of optimal decision-making, we then analyze the impact of misestimation of identified reward rate maximizing parameters to best account for the pattern of errors actually observed in humans and animals. We find that errors in agents’ assessment of the apportionment of time inside versus outside a considered pursuit type is the likely driver of suboptimal temporal decision-making observed behaviorally, which we term the ‘Malapportionment Hypothesis’. By providing a generalized form for reward rate maximization, and by relating it to subjective value and temporal discounting, the true pattern of errors exhibited by humans and animals can now be more deeply understood, identified, and quantified, being key to deducing the learning algorithms and representational architectures actually used by humans and animals to evaluate the worth of pursuits.

Fins, feathers, fingers, and finding an explanation for the puzzle of ephemeral rewards.

On parrots, delay of gratification, executive function, and how sometimes we do the best we can.

Uncertainty avoidance versus conditioned reinforcement: exploring paradoxical choice in rats

A reward self-bias leads to more optimal foraging for ourselves than others

An experimental manipulation of the value of effort

Am I Winning or Losing? Probing the Appraisal of Partial Wins via Response Vigor

Differential patch-leaving behavior during probabilistic foraging in humans and gerbils

The value of initiating a pursuit in temporal decision-making

Humans forage for reward in reinforcement learning tasks

You are How You Eat: Foraging Behavior as a Potential Novel Marker of Rat Affective State

Is a bird in the hand worth two in the future? The neuroeconomics of intertemporal decision-making

Disentangling Effort from Probability of Success: Temporal Dynamics of Frontal Midline Theta in Effort-Based Reward Processing

Short-term memory capacity predicts willingness to expend cognitive effort for reward

Delayed gratification: A grey parrot (Psittacus erithacus) will wait for more tokens.

Intersection of effort and risk: ethological and neurobiological perspectives

Rats pursue food and leisure following the same rational principles

Affective and cognitive mechanisms of risky decision making

Suboptimal choice: A review and quantification of the signal for good news (SiGN) model.

Risk evaluation and behaviour: defining appropriate frames of reference

Cost does not prevent pigeons from investing in the future.

Temporal context effects on suboptimal choice