Entropy adjustment by interpolation for exploration in Proximal Policy Optimization (PPO)
Ayoub Boudlal,Abderahim Khafaji,Jamal Elabbadi
DOI: https://doi.org/10.1016/j.engappai.2024.108401
IF: 8
2024-04-25
Engineering Applications of Artificial Intelligence
Abstract:The proposed algorithm, named "EAE-LPI" (Exploration by Adjustment Entropy via Linear and Polynomial Interpolation), aims to enhance exploration within the Proximal Policy Optimization (PPO) algorithm by addressing two crucial aspects that have been identified as underdeveloped in previous research perspectives. The first aspect involves the introduction of entropy into the algorithm, adjusted using linear interpolation, to promote exploration. This reduces randomness, distinguishing random fluctuations from significant policy improvements. The second aspect involves the incorporation of polynomial interpolation, creating a Lagrange polynomial from existing data points. This allows the utilization of knowledge from neighboring states obtained through interpolation, enabling exploration of previously uncharted areas and reinforcing interactions with the environment. This research introduces the EAE-LPI method, aiming to overcome the limitations of static entropy effects and basic entropy regularization strategies (linear, polynomial, exponential) in exploration control.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary