On Efficient Sampling in Supervisory Reinforcement Learning

Qing-Shan Jia,Qi Guo,Yongcai Wang
DOI: https://doi.org/10.23919/ccc63176.2024.10662039
2024-01-01
Abstract:Supervisory reinforcement learning is a process to learn the best policy from interaction with the environment and from the supervisor. These two ways of interaction usually provide guidance with different level of accuracies. Feedback from the environment is usually noisy and random. Guidance from the supervisor is usually experienced and accurate. It is of practical interest to understand how to efficiently obtain samples from both channels. We consider this important problem in this work, and make the following contributions. First we formulate the sample optimization problem in supervisory reinforcement learning. Second, we convert that into the maximization of the probability of correctly selecting (PCS) the best policy under limited sample budget in both channels. Third, we analyze how to best utilize the guidance from the supervisor and how to interact with the environment regarding selected (state, action) pairs. An algorithm is presented and shown to asymptotically maximize the PCS. We hope this work may shed light on sample efficiency study in more general settings.
What problem does this paper attempt to address?