A Preference-based Reinforcement Learning Approach Using Reward Exploration for Decision Making

Minghao Liu,Haoyu Liu,Min Yu,Qirong Tang
DOI: https://doi.org/10.1109/ccdc62350.2024.10587696
2024-01-01
Abstract:Preference-based reinforement learning solves decision making based on human preferences. By deriving a reward function from human preferences, the agent’s behavior aligns with human expectations, avoiding complex reward tasks. However, in preference-based reinforcement learning, a human teacher can only select one segment of trajectories from a pair, limiting reward function learning. Providing diverse trajectory segments can assist human teachers in better making a selection between two segments. To address this issue, a method is proposed to encourage the exploration of the reward model and thus promote generating more diverse trajectories. This encourages the exploration of the agent to provide a more diverse set of trajectories for the human teacher’s selection. Numerical studies demonstrate the effectiveness and superiority of the proposed method over other preference-based reinforcement learning methods.
What problem does this paper attempt to address?