Action Values Based Reinforcement Learning and Optimized Reward Functions

Qijun CHEN,Yunwei XIAO
DOI: https://doi.org/10.3321/j.issn:0253-374X.2007.04.021
2007-01-01
Abstract:A new reinforcement learning algorithm with 'action values' as a basis for an agent to choose actions is put forward to improve the design of reward signals. For action values are more flexible than traditional state values, it is easier to design more optimized reward functions and improve learning performance. Based on action values, an exponential function and a logarithmic function are used to compute action rewards and discount rate dynamically, which accelerates agents to choose optimized actions. It shows that through the computer simulation of a maze problem the new algorithm reduces action times before convergence and the convergence speed is thus enhanced.
What problem does this paper attempt to address?