Feasibility Consistent Representation Learning for Safe Reinforcement Learning

Zhepeng Cen,Yihang Yao,Zuxin Liu,Ding Zhao
2024-06-13
Abstract:In the field of safe reinforcement learning (RL), finding a balance between satisfying safety constraints and optimizing reward performance presents a significant challenge. A key obstacle in this endeavor is the estimation of safety constraints, which is typically more difficult than estimating a reward metric due to the sparse nature of the constraint signals. To address this issue, we introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL). This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Leveraging self-supervised learning techniques and a more learnable safety metric, our approach enhances the policy learning and constraint estimation. Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
Machine Learning
What problem does this paper attempt to address?
This paper focuses on a core challenge in secure reinforcement learning (RL): how to optimize reward performance while satisfying safety constraints. The authors propose a new framework called Feasibility Consistent Secure Reinforcement Learning (FCSRL). This framework combines representation learning and feasibility-oriented objectives to identify and extract safety-related information from raw states, improving policy learning and constraint estimation in RL. In traditional RL, estimating safety is more difficult than estimating rewards due to the sparsity of constraint signals, leading to inaccurate estimation of safety constraints. FCSRL enhances policy learning by leveraging self-supervised learning techniques and more tractable safety metrics to address this problem. The paper demonstrates through a series of experiments on vector state and image-based tasks that FCSRL can learn better safety-aware embeddings and outperform previous representation learning baselines, especially under stricter constraint conditions. Furthermore, the paper introduces a novel learning objective called feasibility score, which exhibits smoother properties than other cost metrics. It serves as an auxiliary task for representation learning to enhance the precision of safety-contextual features, thereby finding a better balance between reward maximization and meeting safety constraints.