Abstract:Reinforcement learning (RL) is a class of artificial intelligence algorithms being used to design adaptive optimal controllers through online learning. This paper presents a model-free, real-time, data-efficient Q-learning-based algorithm to solve the H$_{\infty}$ control of linear discrete-time systems. The computational complexity is shown to reduce from $\mathcal{O}(\underline{q}^3)$ in the literature to $\mathcal{O}(\underline{q}^2)$ in the proposed algorithm, where $\underline{q}$ is quadratic in the sum of the size of state variables, control inputs, and disturbance. An adaptive optimal controller is designed and the parameters of the action and critic networks are learned online without the knowledge of the system dynamics, making the proposed algorithm completely model-free. Also, a sufficient probing noise is only needed in the first iteration and does not affect the proposed algorithm. With no need for an initial stabilizing policy, the algorithm converges to the closed-form solution obtained by solving the Riccati equation. A simulation study is performed by applying the proposed algorithm to real-time control of an autonomous mobility-on-demand (AMoD) system for a real-world case study to evaluate the effectiveness of the proposed algorithm.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the H∞ control problem of linear discrete - time systems, especially for the real - time control requirements in the Autonomous Mobility - on - Demand (AMoD) system. Specifically, the paper proposes a model - free, real - time, and data - efficient Q - learning algorithm based on Reinforcement Learning (RL) to solve the H∞ control problem. The following are the main objectives of the paper: 1. **Model - free control**: Design a fully model - free adaptive optimal controller that can learn parameters online without knowing the system dynamics. 2. **Reduce computational complexity**: Reduce the computational complexity from the existing O(q^3) to O(q^2), where q is the square of the sum of the state variables, control inputs, and perturbation magnitudes. 3. **No need for an initial stabilizing policy**: The algorithm can converge to the closed - form solution obtained by solving the Riccati equation without the need for an initial stabilizing policy. 4. **Real - time application**: Apply the proposed algorithm to the actual AMoD system to verify its effectiveness and performance in real - time control. ### Main contributions 1. **Propose a model - free, real - time, and data - efficient algorithm** for solving the H∞ control problem of linear discrete - time systems. 2. **Reduce computational complexity** from O(q^3) to O(q^2). 3. **Discuss the properties of the algorithm and prove its convergence**. 4. **Apply the algorithm to the AMoD system** and show its application effect in actual scenarios. ### Research background The paper first reviews the research background of the AMoD system and emphasizes the advantages of autonomous vehicles (AVs) in reducing operating costs and improving user experience. However, the AMoD system may experience supply - demand imbalance without proper control, so effective rebalancing strategies are needed to optimize vehicle scheduling. ### Methods and techniques The paper adopts a Q - learning - based Reinforcement Learning method to optimize the control strategy through online learning. The specific steps are as follows: 1. **Problem modeling**: Model the AMoD system as a linear discrete - time dynamic system and define the system's state, control input, and external perturbation. 2. **Q - learning algorithm**: Propose a model - free Q - learning algorithm to update the parameters of the action network and the evaluation network through online learning. 3. **Algorithm implementation**: Describe in detail the online implementation process of the algorithm and analyze its properties and convergence. ### Experimental verification The paper verifies the effectiveness of the proposed algorithm through simulation experiments. In the experiment, a 12 - hour historical travel data set was used, and each time step was set to 2 minutes, with a total of 360 iterations. The experimental results show that the proposed algorithm performs excellently in real - time control of the AMoD system and can effectively optimize vehicle scheduling and reduce waiting time. ### Conclusion The paper successfully proposes a model - free, real - time, and data - efficient Q - learning algorithm for solving the H∞ control problem of linear discrete - time systems and applies it to the AMoD system. The experimental results verify the effectiveness and superiority of the algorithm in practical applications.

Data-Driven H-infinity Control with a Real-Time and Efficient Reinforcement Learning Algorithm: An Application to Autonomous Mobility-on-Demand Systems

Reinforcement Learning-Based $\mathcal{h}_{\infty }$ Control of 2-D Markov Jump Roesser Systems with Optimal Disturbance Attenuation

Data-Driven Solutions to Mixed $h_{2}/h_{\infty}$ Control: A Hamilton-Inequality-Driven Reinforcement Learning Approach.

Model-Free $h_{2}/h_{\infty}$ Control of Discrete-Time Stochastic Systems: A Reinforcement Learning Method

Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS with Unidentified Exosystem Dynamics.

Model-free Reinforcement Learning for H_2/H_∞ Control of Stochastic Discrete-time Systems

Off-Policy Reinforcement Learning for $ H_\infty $ Control Design

$$H_\infty $$ Control Using Reinforcement Learning

Adaptive Q-Learning Based Model-Free $h_{\infty }$ Control of Continuous-Time Nonlinear Systems: Theory and Application

Online adaptive data-driven control for unknown nonlinear systems with constrained-input

Reinforcement Learning for Finite-Horizon H∞ Tracking Control of Unknown Discrete Linear Time-Varying System

Reinforcement Q-learning Algorithm for H ∞ Tracking Control of Discrete-Time Markov Jump Systems

Model-free $H_{\infty}$ control of Itô stochastic system via off-policy reinforcement learning

Reinforcement Learning Reduced H ∞ Output Tracking Control of Nonlinear Two-Time-Scale Industrial Systems

H∞ Control with Constrained Input for Completely Unknown Nonlinear Systems Using Data-Driven Reinforcement Learning Method

Data-Driven &Lt;inline-Formula> &Lt;tex-Math Notation="latex">$h_\infty$ &Lt;/tex-Math></inline-formula> Control for Nonlinear Distributed Parameter Systems

Off-policy reinforcement learning for H∞ control design.

Off-Policy Reinforcement Learning for &Lt;inline-Formula> &Lt;tex-Math Notation="latex">$ H_\infty $ &Lt;/tex-Math></inline-formula> Control Design

Game Theoretical Reinforcement Learning for Robust H∞ Tracking Control of Discrete-Time Linear Systems with Unknown Dynamics

Direct Data-Driven Discounted Infinite Horizon Linear Quadratic Regulator with Robustness Guarantees