Abstract:This paper proposes an advanced Reinforcement Learning (RL) method, incorporating reward-shaping, safety value functions, and a quantum action selection algorithm. The method is model-free and can synthesize a finite policy that maximizes the probability of satisfying a complex task. Although RL is a promising approach, it suffers from unsafe traps and sparse rewards and becomes impractical when applied to real-world problems. To improve safety during training, we introduce a concept of safety values, which results in a model-based adaptive scenario due to online updates of transition probabilities. On the other hand, a high-level complex task is usually formulated via formal languages, including Linear Temporal Logic (LTL). Another novelty of this work is using an Embedded Limit-Deterministic Generalized Büchi Automaton (E-LDGBA) to represent an LTL formula. The obtained deterministic policy can generalize the tasks over infinite and finite horizons. We design an automaton-based reward, and the theoretical analysis shows that an agent can accomplish task specifications with the maximum probability by following the optimal policy. Furthermore, a reward shaping process is developed to avoid sparse rewards and enforce the RL convergence while keeping the optimal policies invariant. In addition, inspired by quantum computing, we propose a quantum action selection algorithm to replace the existing -greedy algorithm for the balance of exploration and exploitation during learning. Simulations demonstrate how the proposed framework can achieve good performance by dramatically reducing the times to visit unsafe states while converging optimal policies.

Provably efficient exploration in quantum reinforcement learning with logarithmic worst-case regret

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets

Quantum Reinforcement Learning with Quantum World Model

Quantum-Inspired Reinforcement Learning for Quantum Control

Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Online Learning Quantum States with the Logarithmic Loss via VB-FTRL

Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes

Quantum Reinforcement Learning in Non-Abelian Environments: Unveiling Novel Formulations and Quantum Advantage Exploration

Efficient quantum recurrent reinforcement learning via quantum reservoir computing

Optimization of Reinforcement Learning Using Quantum Computation

Safe reinforcement learning under temporal logic with reward design and quantum action selection

Offline Quantum Reinforcement Learning in a Conservative Manner

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Provably Efficient Exploration in Reward Machines with Low Regret

Robust Optimization for Quantum Reinforcement Learning Control Using Partial Observations

A Quantum Technology for Reinforcement Learning on Channel Assignment

MQES: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning

Quantum framework for Reinforcement Learning: integrating Markov Decision Process, quantum arithmetic, and trajectory search

Variational Quantum Circuit Design for Quantum Reinforcement Learning on Continuous Environments