Abstract:Value function approximation, such as Q-learning, is widely used in the discrete control rather than the continuous one because the optimal action in the discrete setting is more easily selected. Optimizing the action is a non-convex optimization problem with respect to the complex value function. Some notable studies simplify the non-convex optimization problem by assuming the value function as quadratic in the actions or by discretizing the action space. However, the performance of the output policy will decline if these studies’ premises do not hold. In order to address the problem, we propose a framework that combines swarm intelligence algorithms with value-based Reinforcement Learning, where the swarm intelligence algorithms are employed to search for the optimal action with respect to the state and the value function. To ensure the correctness of this framework, we conditionally claim the convergence rate of swarm intelligence algorithms with high probability. We then implement it by searching the batch optimal actions to various states on the GPU platform for the batch training. Furthermore, we employ the population-based atomic actions for the compatibility with the existing related work about solving discrete control problems. Four classical control models and four robot simulation environments are utilized in the comparisons. According to empirical results, our framework outputs a policy comparable with that of the policy-based algorithms by 10% timesteps in the continuous control. Note to Practitioners—This paper is motivated by the exploration-exploitation dilemma of Reinforcement Learning to solve continuous control tasks. To balance the exploration and exploitation, the stochastic exploration and the prioritized exploration are roughly two feasible ways, where the prioritized one is a better choice due to the higher data efficiency than the stochastic one, e.g. $varepsilon$ -greedy. Normally, the prioritized exploration works well in the value-based Reinforcement Learning algorithms rather than the policy-based ones; meanwhile, the policy-based algorithms are more suitable to continuous control tasks than the value-based ones. To tackle this conflict, we especially design a particle swarm optimization to maximize the Q-value of action in Q-learning. Our design can be hybridized by various swarm intelligence and value-based Reinforcement Learning algorithms. Also, it can be embedded in most intelligent control systems easily. The aim of this study is to solve the continuous control tasks by value-based algorithms as the first step of applying the prioritized exploration. The simulative results verify the effectiveness and efficiency of our design.

On the continuity and smoothness of the value function in reinforcement learning and optimal control

Continuity of the value function in sparse optimal control

Differentiability of the value function on of semilinear parabolic infinite time horizon optimal control problems under control constraints

Reinforcement learning in continuous time and space

Subgradient evolution of value functions in discrete-time optimal control

On the stability of Lipschitz continuous control problems and its application to reinforcement learning

Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory

Control theory approach to continuous-time finite state mean field games

Continuous Control With Swarm Intelligence Based Value Function Approximation

A continuous-time fundamental lemma and its application in data-driven optimal control

Stochastic optimal control in Hilbert spaces: $C^{1,1}$ regularity of the value function and optimal synthesis via viscosity solutions

Analysis of the vanishing discount limit for optimal control problems in continuous and discrete time

Value constrained model-free continuous control

On Bellman equations for continuous-time policy evaluation I: discretization and approximation

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

Continuous time Stochastic optimal control under discrete time partial observations

Continuity of Parametric Optima for Possibly Discontinuous Functions and Noncompact Decision Sets

Robust Policy Optimization in Continuous-time Mixed $\mathcal{H}_2/\mathcal{H}_\infty$ Stochastic Control

Reinforcement Learning Policies in Continuous-Time Linear Systems

Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off