Abstract:The application of reinforcement learning in industrial fields makes the safety problem of the agent a research hotspot. Traditional methods mainly alter the objective function and the exploration process of the agent to address the safety problem. Those methods, however, can hardly prevent the agent from falling into dangerous states because most of the methods ignore the damage caused by unsafe states. As a result, most solutions are not satisfactory. In order to solve the aforementioned problem, we come forward with a safe Q-learning method that is based on constrained Markov decision processes, adding safety constraints as prerequisites to the model, which improves standard Q-learning algorithm so that the proposed algorithm seeks for the optimal solution ensuring that the safety premise is satisfied. During the process of finding the solution in form of the optimal state-action value, the feasible space of the agent is limited to the safe space that guarantees the safety via the feasible space being filtered by constraints added to the action space. Because the traditional solution methods are not applicable to the safe Q-learning model as they tend to obtain local optimal solution, we take advantage of the Lagrange multiplier method to solve the optimal action that can be performed in the current state based on the premise of linearizing constraint functions, which not only improves the efficiency and accuracy of the algorithm, but also guarantees to obtain the global optimal solution. The experiments verify the effectiveness of the algorithm.

Convergence of the Q-Ae Learning under Deterministic Mdps and Its Efficiency under the Stochastic Environment

Gradient Q : A Unified Algorithm with Function Approximation for Reinforcement Learning

Convergence Rate of Primal-Dual Approach to Constrained Reinforcement Learning with Softmax Policy

Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis

Relative Q-Learning for Average-Reward Markov Decision Processes with Continuous States

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

An immediate-return reinforcement learning for the atypical Markov decision processes

Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization

Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

Safe Q-Learning Method Based on Constrained Markov Decision Processes.

Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Q-learning Solution for Optimal Consensus Control of Discrete-Time Multiagent Systems Using Reinforcement Learning

Convergent and Efficient Deep Q Network Algorithm

A Q-learning algorithm for Markov decision processes with continuous state spaces

A Structure-aware Online Learning Algorithm for Markov Decision Processes

Bayesian Q learning method with Dyna architecture and prioritized sweeping

Multi-Agent Alternate Q-Learning.

A Hybrid PAC Reinforcement Learning Algorithm

A Heuristic Dyna Optimizing Algorithm Using Approximate Model Representation

Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments