Abstract:Value function approximation, such as Q-learning, is widely used in the discrete control rather than the continuous one because the optimal action in the discrete setting is more easily selected. Optimizing the action is a non-convex optimization problem with respect to the complex value function. Some notable studies simplify the non-convex optimization problem by assuming the value function as quadratic in the actions or by discretizing the action space. However, the performance of the output policy will decline if these studies’ premises do not hold. In order to address the problem, we propose a framework that combines swarm intelligence algorithms with value-based Reinforcement Learning, where the swarm intelligence algorithms are employed to search for the optimal action with respect to the state and the value function. To ensure the correctness of this framework, we conditionally claim the convergence rate of swarm intelligence algorithms with high probability. We then implement it by searching the batch optimal actions to various states on the GPU platform for the batch training. Furthermore, we employ the population-based atomic actions for the compatibility with the existing related work about solving discrete control problems. Four classical control models and four robot simulation environments are utilized in the comparisons. According to empirical results, our framework outputs a policy comparable with that of the policy-based algorithms by 10% timesteps in the continuous control. Note to Practitioners—This paper is motivated by the exploration-exploitation dilemma of Reinforcement Learning to solve continuous control tasks. To balance the exploration and exploitation, the stochastic exploration and the prioritized exploration are roughly two feasible ways, where the prioritized one is a better choice due to the higher data efficiency than the stochastic one, e.g. $varepsilon$ -greedy. Normally, the prioritized exploration works well in the value-based Reinforcement Learning algorithms rather than the policy-based ones; meanwhile, the policy-based algorithms are more suitable to continuous control tasks than the value-based ones. To tackle this conflict, we especially design a particle swarm optimization to maximize the Q-value of action in Q-learning. Our design can be hybridized by various swarm intelligence and value-based Reinforcement Learning algorithms. Also, it can be embedded in most intelligent control systems easily. The aim of this study is to solve the continuous control tasks by value-based algorithms as the first step of applying the prioritized exploration. The simulative results verify the effectiveness and efficiency of our design.

Value approximation with least squares support vector machine in reinforcement learning system

Gradient Q(σ, Λ): A Unified Algorithm with Function Approximation for Reinforcement Learning

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

Using Svm To Model And Control Nonlinear Dynamical Systems

Multi-Agent Q-Value Mixing Network with Covariance Matrix Adaptation Strategy for the Voltage Regulation Problem

Continuous Control With Swarm Intelligence Based Value Function Approximation

Minimax Q-learning Control for Linear Systems Using the Wasserstein Metric

How to discretize continuous state-action spaces in Q-learning: A symbolic control approach

Implicit Posterior Sampling Reinforcement Learning for Continuous Control

Manifold Regularization Based Approximate Value Iteration For Learning Control

Bayesian Inference Based Learning Automaton Scheme in Q-model Environments.

Hybrid Reinforcement Learning for Optimal Control of Non-Linear Switching System

A Fuzzy Multi-Step Q Learning Algorithm Based On Q(Lambda)- Learning And Its Application

Average Cost Optimal Control of Stochastic Systems Using Reinforcement Learning

Value iteration for LQR control of unknown stochastic-parameter linear systems

Control of Free-Floating Space Robots to Capture Targets Using Soft Q-Learning.

Policy Sharing Using Aggregation Trees for ${Q}$ -Learning in a Continuous State and Action Spaces

Stochastic Q-learning for Large Discrete Action Spaces

Approximate Q-Learning for Controlled Diffusion Processes and its Near Optimality

Online support vector regression for reinforcement learning

An Optimal Tracking Control Method with Q-learning for Discrete-time Linear Switched System