Abstract:Motivated by the recent empirical success of policy-based reinforcement learning (RL), there has been a research trend studying the performance of policy-based RL methods on standard control benchmark problems. In this paper, we examine the effectiveness of policy-based RL methods on an important robust control problem, namely $\mu$ synthesis. We build a connection between robust adversarial RL and $\mu$ synthesis, and develop a model-free version of the well-known $DK$-iteration for solving state-feedback $\mu$ synthesis with static $D$-scaling. In the proposed algorithm, the $K$ step mimics the classical central path algorithm via incorporating a recently-developed double-loop adversarial RL method as a subroutine, and the $D$ step is based on model-free finite difference approximation. Extensive numerical study is also presented to demonstrate the utility of our proposed model-free algorithm. Our study sheds new light on the connections between adversarial RL and robust control.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use the policy - based reinforcement learning (RL) method to solve the important robust control problem of state - feedback μ - synthesis. Specifically, the paper aims to establish the connection between robust adversarial reinforcement learning (RARL) and μ - synthesis, and develop a model - free μ - synthesis algorithm. This algorithm can optimize the controller to improve the robust performance of the system through the "black - box" interaction data provided by the simulator without relying on the exact model of the system.
### Core Problems of the Paper
1. **Establishing the Connection**: The paper first establishes the theoretical connection between robust adversarial reinforcement learning (RARL) and μ - synthesis. μ - synthesis is a method for designing controllers, aiming to optimize the structured singular value of the system, that is, the robust performance index of the system.
2. **Developing the Algorithm**: Based on the above - mentioned theoretical connection, the paper proposes a model - free μ - synthesis algorithm. This algorithm can be regarded as the model - free version of the classical DK iteration. In this algorithm:
- **K - step**: Imitating the classical central path algorithm, the optimization of the controller is achieved by combining the recently developed two - loop adversarial reinforcement learning method as a sub - routine.
- **D - step**: Based on the model - free finite - difference approximation method, the static D - scaling matrix is optimized.
3. **Numerical Research**: The paper demonstrates the effectiveness of the proposed model - free algorithm through extensive numerical research. These studies not only verify the performance of the algorithm in solving the μ - synthesis problem, but also provide a new perspective for further understanding the relationship between adversarial reinforcement learning and robust control.
### Formula Explanation
- **Structured Singular Value (μ)**:
\[
\mu_K=\inf\left\{\gamma\mid\text{for all }\Delta\in\Delta\text{ satisfying }\|\Delta\|_\infty\leq\frac{1}{\gamma},\text{ the closed - loop system is well - defined, stable, and }\|T_{d\to e}(\Delta)\|_\infty\leq\gamma\right\}
\]
- **Upper Bound of D - Scaling**:
\[
\bar{\mu}_K = \inf_{D\in D}\left\|\text{diag}(D, I)F_l(G, K)\text{diag}(D^{-1}, I)\right\|_\infty
\]
- **H∞ - Norm Estimation**:
\[
\|\tilde{G}\|_\infty\approx\text{HinfOracle}(\tilde{G}, N)
\]
where \(N\) is a user - specified parameter, and \(\text{HinfOracle}\) uses the simulated input / output data to query \(\tilde{G}_N\) and outputs a number to estimate the spectral radius \(\bar{\sigma}(\tilde{G}_N)\).
- **Central Difference Gradient Estimation**:
\[
g_j(d,\epsilon)=\frac{H(d + \epsilon e_j)-H(d-\epsilon e_j)}{2\epsilon},\quad j\in[m]
\]
### Conclusion
The paper establishes the connection between robust adversarial reinforcement learning and μ - synthesis, proposes a model - free μ - synthesis algorithm, and verifies its effectiveness through numerical experiments. This not only provides a new method for solving complex robust control problems, but also provides new insights into understanding the application of adversarial reinforcement learning in robust control.