Orientation-Preserving Rewards’ Balancing in Reinforcement Learning

Jinsheng Ren,Shangqi Guo,Feng Chen
DOI: https://doi.org/10.1109/tnnls.2021.3080521
IF: 14.255
2022-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Auxiliary rewards are widely used in complex reinforcement learning tasks. However, previous work can hardly avoid the interference of auxiliary rewards on pursuing the main rewards, which leads to the destruction of the optimal policy. Thus, it is challenging but essential to balance the main and auxiliary rewards. In this article, we explicitly formulate the problem of rewards’ balancing as searching for a Pareto optimal solution, with the overall objective of preserving the policy’s optimization orientation for the main rewards (i.e., the policy driven by the balanced rewards is consistent with the policy driven by the main rewards). To this end, we propose a variant Pareto and show that it can effectively guide the policy search toward more main rewards. Furthermore, we establish an iterative learning framework for rewards’ balancing and theoretically analyze its convergence and time complexity. Experiments in both discrete (grid word) and continuous (Doom) environments demonstrated that our algorithm can effectively balance rewards, and achieve remarkable performance compared with those RLs with heuristically designed rewards. In the ViZDoom platform, our algorithm can learn expert-level policies.
What problem does this paper attempt to address?