Hyper: Hyperparameter Robust Efficient Exploration in Reinforcement Learning

Yiran Wang,Chenshu Liu,Yunfan Li,Sanae Amani,Bolei Zhou,Lin F. Yang
2024-12-05
Abstract:The exploration \& exploitation dilemma poses significant challenges in reinforcement learning (RL). Recently, curiosity-based exploration methods achieved great success in tackling hard-exploration problems. However, they necessitate extensive hyperparameter tuning on different environments, which heavily limits the applicability and accessibility of this line of methods. In this paper, we characterize this problem via analysis of the agent behavior, concluding the fundamental difficulty of choosing a proper hyperparameter. We then identify the difficulty and the instability of the optimization when the agent learns with curiosity. We propose our method, hyperparameter robust exploration (\textbf{Hyper}), which extensively mitigates the problem by effectively regularizing the visitation of the exploration and decoupling the exploitation to ensure stable training. We theoretically justify that \textbf{Hyper} is provably efficient under function approximation setting and empirically demonstrate its appealing performance and robustness in various environments.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the dilemma between exploration and exploitation in Reinforcement Learning (RL). Specifically, the paper points out that although curiosity - based exploration methods have achieved remarkable success in solving highly difficult exploration problems, these methods require extensive hyper - parameter tuning for different environments, which greatly limits the applicability and ease - of - use of these methods. The main problems lie in selecting appropriate hyper - parameters (especially the curiosity hyper - parameter \(\beta\)), and the difficulty and instability of the optimization process when the agent learns through curiosity. To solve these problems, the paper proposes a new method - Hyperparameter Robust Exploration (Hyper). The Hyper method effectively adjusts the visitation distribution of exploration and decouples exploration and exploitation to ensure stable training. Specifically, Hyper achieves its goals in the following ways: 1. **Adjusting the visitation distribution**: Hyper reduces the optimization instability caused by frequent policy changes by increasing the persistence of exploration. 2. **Decoupling exploration and exploitation**: Hyper uses an additional policy to decouple exploration learning and exploitation learning to prevent over - exploration. The paper also theoretically proves the efficiency of Hyper in the function approximation setting and experimentally demonstrates its excellent performance and robustness in various environments.