James Queeney,Erhan Can Ozcan,Ioannis Ch. Paschalidis,Christos G. Cassandras
Abstract:Robustness and safety are critical for the trustworthy deployment of deep reinforcement learning. Real-world decision making applications require algorithms that can guarantee robust performance and safety in the presence of general environment disturbances, while making limited assumptions on the data collection process during training. In order to accomplish this goal, we introduce a safe reinforcement learning framework that incorporates robustness through the use of an optimal transport cost uncertainty set. We provide an efficient implementation based on applying Optimal Transport Perturbations to construct worst-case virtual state transitions, which does not impact data collection during training and does not require detailed simulator access. In experiments on continuous control tasks with safety constraints, our approach demonstrates robust performance while significantly improving safety at deployment time compared to standard safe reinforcement learning.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve robustness and safety in Deep Reinforcement Learning (DRL). Specifically, the authors propose a new framework, aiming to enhance the algorithm's robustness against environmental perturbations by using Optimal Transport Cost (OTC) uncertainty sets, while ensuring safety performance during deployment. The goal of this framework is to provide robust performance and safety guarantees for general - form environmental uncertainties without assuming detailed information in the data collection process and without the need to access a detailed simulator.
### Core Problems of the Paper
1. **Robustness and Safety**: How to ensure the robustness and safety of deep reinforcement learning algorithms in the presence of environmental perturbations?
2. **General Uncertainty Handling**: How to design methods that can handle various forms of environmental uncertainties when lacking prior knowledge about the structure of potential perturbations?
3. **Data Collection and Training Process**: How to achieve robustness and safety in the standard data collection process without affecting data collection in the actual environment or requiring a complex simulator?
### Solutions
The authors propose a robust reinforcement learning framework based on optimal transport cost, with the following main contributions:
1. **Robustness Framework**: A safe reinforcement learning framework is proposed, incorporating robustness through the uncertainty set defined by the optimal transport cost.
2. **Recasting of the Worst - Case Optimization Problem**: It is proved that through Optimal Transport Perturbations (OTP), the virtual state transitions in the worst - case can be directly constructed in the state space, thereby transforming the worst - case optimization problem into a more tractable form.
3. **Efficient Implementation**: An efficient deep reinforcement learning implementation method is proposed. By applying optimal transport perturbations to construct virtual state transitions in the worst - case, it does not affect data collection during the training process.
4. **Experimental Verification**: Through experiments on continuous control tasks, a significant improvement in robust performance and safety of this method is demonstrated.
### Formulas and Concepts
- **Optimal Transport Cost**:
\[
\text{OTC}_{d_{s,a}}(\hat{p}_{s,a}, p_{s,a})=\inf_{\nu \in \Gamma(\hat{p}_{s,a}, p_{s,a})} \int_{S \times S} d_{s,a}(\hat{s}', s') \, d\nu(\hat{s}', s')
\]
where $\Gamma(\hat{p}_{s,a}, p_{s,a})$ is the set of all couplings of $\hat{p}_{s,a}$ and $p_{s,a}$.
- **Robust Bellman Operator**:
\[
T^\pi_{P,r} Q_r(s,a)=r(s,a)+\gamma \inf_{p_{s,a} \in P_{s,a}} \mathbb{E}_{s' \sim p_{s,a}} [V^\pi_r(s')]
\]
\[
T^\pi_{P,c} Q_c(s,a)=c(s,a)+\gamma \sup_{p_{s,a} \in P_{s,a}} \mathbb{E}_{s' \sim p_{s,a}} [V^\pi_c(s')]
\]
- **Optimal Transport Perturbation**:
\[
g^r_{s,a} \in \arg \min_{g \in G} \mathbb{E}_{\hat{s}' \sim \hat{p}_{s,a}} [V^\pi_r(g(\hat{s}'))] \quad \text{s.t.} \quad \mathbb{E}_{\hat{s}' \sim \hat{p}_{s,a}} [d_{s,a}(\hat{s}', g(\hat{s}'))]