Abstract:Motivated by the recent empirical success of policy-based reinforcement learning (RL), there has been a research trend studying the performance of policy-based RL methods on standard control benchmark problems. In this paper, we examine the effectiveness of policy-based RL methods on an important robust control problem, namely $\mu$ synthesis. We build a connection between robust adversarial RL and $\mu$ synthesis, and develop a model-free version of the well-known $DK$-iteration for solving state-feedback $\mu$ synthesis with static $D$-scaling. In the proposed algorithm, the $K$ step mimics the classical central path algorithm via incorporating a recently-developed double-loop adversarial RL method as a subroutine, and the $D$ step is based on model-free finite difference approximation. Extensive numerical study is also presented to demonstrate the utility of our proposed model-free algorithm. Our study sheds new light on the connections between adversarial RL and robust control.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to use the policy - based reinforcement learning (RL) method to solve the important robust control problem of state - feedback μ - synthesis. Specifically, the paper aims to establish the connection between robust adversarial reinforcement learning (RARL) and μ - synthesis, and develop a model - free μ - synthesis algorithm. This algorithm can optimize the controller to improve the robust performance of the system through the "black - box" interaction data provided by the simulator without relying on the exact model of the system. ### Core Problems of the Paper 1. **Establishing the Connection**: The paper first establishes the theoretical connection between robust adversarial reinforcement learning (RARL) and μ - synthesis. μ - synthesis is a method for designing controllers, aiming to optimize the structured singular value of the system, that is, the robust performance index of the system. 2. **Developing the Algorithm**: Based on the above - mentioned theoretical connection, the paper proposes a model - free μ - synthesis algorithm. This algorithm can be regarded as the model - free version of the classical DK iteration. In this algorithm: - **K - step**: Imitating the classical central path algorithm, the optimization of the controller is achieved by combining the recently developed two - loop adversarial reinforcement learning method as a sub - routine. - **D - step**: Based on the model - free finite - difference approximation method, the static D - scaling matrix is optimized. 3. **Numerical Research**: The paper demonstrates the effectiveness of the proposed model - free algorithm through extensive numerical research. These studies not only verify the performance of the algorithm in solving the μ - synthesis problem, but also provide a new perspective for further understanding the relationship between adversarial reinforcement learning and robust control. ### Formula Explanation - **Structured Singular Value (μ)**: \[ \mu_K=\inf\left\{\gamma\mid\text{for all }\Delta\in\Delta\text{ satisfying }\|\Delta\|_\infty\leq\frac{1}{\gamma},\text{ the closed - loop system is well - defined, stable, and }\|T_{d\to e}(\Delta)\|_\infty\leq\gamma\right\} \] - **Upper Bound of D - Scaling**: \[ \bar{\mu}_K = \inf_{D\in D}\left\|\text{diag}(D, I)F_l(G, K)\text{diag}(D^{-1}, I)\right\|_\infty \] - **H∞ - Norm Estimation**: \[ \|\tilde{G}\|_\infty\approx\text{HinfOracle}(\tilde{G}, N) \] where $N$ is a user - specified parameter, and $\text{HinfOracle}$ uses the simulated input / output data to query $\tilde{G}_N$ and outputs a number to estimate the spectral radius $\bar{\sigma}(\tilde{G}_N)$. - **Central Difference Gradient Estimation**: \[ g_j(d,\epsilon)=\frac{H(d + \epsilon e_j)-H(d-\epsilon e_j)}{2\epsilon},\quad j\in[m] \] ### Conclusion The paper establishes the connection between robust adversarial reinforcement learning and μ - synthesis, proposes a model - free μ - synthesis algorithm, and verifies its effectiveness through numerical experiments. This not only provides a new method for solving complex robust control problems, but also provides new insights into understanding the application of adversarial reinforcement learning in robust control.

Model-Free $μ$ Synthesis via Adversarial Reinforcement Learning

Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective

Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning

Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

Robust Proximal Adversarial Reinforcement Learning under Model Mismatch

Online Nonstochastic Model-Free Reinforcement Learning

Online Robust Policy Learning in the Presence of Unknown Adversaries

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives

Model-Based Reinforcement Learning via Meta-Policy Optimization

Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning

Robust Model Based Reinforcement Learning Using $\mathcal{L}_1$ Adaptive Control

Robust Safe Reinforcement Learning under Adversarial Disturbances

Robust Reinforcement Learning through Efficient Adversarial Herding

Adversarial Imitation Learning via Random Search

Model-Free Robust $ϕ$-Divergence Reinforcement Learning Using Both Offline and Online Data

Sample Complexity of Robust Reinforcement Learning with a Generative Model

Robustifying Reinforcement Learning Agents via Action Space Adversarial Training