TPN:Triple network algorithm for deep reinforcement learning

Chen Han,Xuanyin Wang
DOI: https://doi.org/10.2139/ssrn.4606047
IF: 6
2024-04-25
Neurocomputing
Abstract:The target net method has been the foundation of deep reinforcement learning since Deepmind first proposed it in 2015. Almost all the current popular reinforcement learning algorithms include target net. However, while the slowly updated target network improves the stability of the algorithm, it also reduces the performance of the algorithm. In this paper, the authors design a novel triple-network algorithm(TPN). TPN combines the temporal-difference(TD) algorithm and policy gradient(PG) theorem. Using three networks to estimate the state value( v ), action value (q) , and policy( π ). These networks have no primary or secondary distinction but are trained synchronously and influence each other. The author found that through this TPN architecture, the convergence and stability of the algorithm can be greatly improved without increasing the amount of calculation. Although it is only a basic framework at present. The calculation process of TPN is simple and easy to implement. Experiments prove that the convergence speed and stability of TPN in discrete cases are better than PPO.
computer science, artificial intelligence
What problem does this paper attempt to address?