Information-Directed Policy Search in Sparse-Reward Settings Via the Occupancy Information Ratio.

Wesley A. Suttle,Alec Koppel,Ji Liu
DOI: https://doi.org/10.1109/ciss56502.2023.10089655
2023-01-01
Abstract:This paper examines a new measure of the exploration/exploitation trade-off in reinforcement learning (RL) called the occupancy information ratio (OIR). To this end, the paper derives the Information-Directed Actor-Critic (IDAC) algorithm for solving the OIR problem, provides an overview of the rich theory underlying IDAC and related OIR policy gradient methods, and experimentally investigates the advantages of such methods. The central contribution of this paper is to provide empirical evidence that, due to the form of the OIR objective, IDAC enjoys superior performance over vanilla RL methods in sparse-reward environments.
What problem does this paper attempt to address?