Abstract:The difficulty of identifying the physical model of complex systems has led to exploring methods that do not rely on such complex modeling of the systems. Deep reinforcement learning has been the pioneer for solving this problem without the need for relying on the physical model of complex systems by just interacting with it. However, it uses a black-box learning approach that makes it difficult to be applied within real-world and safety-critical systems without providing explanations of the actions derived by the model. Furthermore, an open research question in deep reinforcement learning is how to focus the policy learning of critical decisions within a sparse domain. This paper proposes a novel approach for the use of deep reinforcement learning in safety-critical systems. It combines the advantages of probabilistic modeling and reinforcement learning with the added benefits of interpretability and works in collaboration and synchronization with conventional decision-making strategies. The BC-SRLA is activated in specific situations which are identified autonomously through the fused information of probabilistic model and reinforcement learning, such as abnormal conditions or when the system is near-to-failure. Further, it is initialized with a baseline policy using policy cloning to allow minimum interactions with the environment to address the challenges associated with using RL in safety-critical industries. The effectiveness of the BC-SRLA is demonstrated through a case study in maintenance applied to turbofan engines, where it shows superior performance to the prior art and other baselines.

SAMBA: safe model-based & active reinforcement learning

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty

Safe Model-Based Reinforcement Learning for Systems with Parametric Uncertainties

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Safe Interactive Model-Based Learning

Provable Safe Reinforcement Learning with Binary Feedback

Hierarchical Framework for Interpretable and Probabilistic Model-Based Safe Reinforcement Learning

Safe Reinforcement Learning in Constrained Markov Decision Processes

Learning safety in model-based Reinforcement Learning using MPC and Gaussian Processes

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Dynamics Adaptive Safe Reinforcement Learning with a Misspecified Simulator

Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning

SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

SAM-RL: Sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering

Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems

Meta-active Learning in Probabilistically-Safe Optimization

Context-Aware Safe Reinforcement Learning for Non-Stationary Environments

Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time