Abstract:Reinforcement learning (RL) is a class of artificial intelligence algorithms being used to design adaptive optimal controllers through online learning. This paper presents a model-free, real-time, data-efficient Q-learning-based algorithm to solve the H$_{\infty}$ control of linear discrete-time systems. The computational complexity is shown to reduce from $\mathcal{O}(\underline{q}^3)$ in the literature to $\mathcal{O}(\underline{q}^2)$ in the proposed algorithm, where $\underline{q}$ is quadratic in the sum of the size of state variables, control inputs, and disturbance. An adaptive optimal controller is designed and the parameters of the action and critic networks are learned online without the knowledge of the system dynamics, making the proposed algorithm completely model-free. Also, a sufficient probing noise is only needed in the first iteration and does not affect the proposed algorithm. With no need for an initial stabilizing policy, the algorithm converges to the closed-form solution obtained by solving the Riccati equation. A simulation study is performed by applying the proposed algorithm to real-time control of an autonomous mobility-on-demand (AMoD) system for a real-world case study to evaluate the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?
### Problems the paper attempts to solve
This paper aims to solve the H∞ control problem of linear discrete - time systems, especially for the real - time control requirements in the Autonomous Mobility - on - Demand (AMoD) system. Specifically, the paper proposes a model - free, real - time, and data - efficient Q - learning algorithm based on Reinforcement Learning (RL) to solve the H∞ control problem. The following are the main objectives of the paper:
1. **Model - free control**: Design a fully model - free adaptive optimal controller that can learn parameters online without knowing the system dynamics.
2. **Reduce computational complexity**: Reduce the computational complexity from the existing O(q^3) to O(q^2), where q is the square of the sum of the state variables, control inputs, and perturbation magnitudes.
3. **No need for an initial stabilizing policy**: The algorithm can converge to the closed - form solution obtained by solving the Riccati equation without the need for an initial stabilizing policy.
4. **Real - time application**: Apply the proposed algorithm to the actual AMoD system to verify its effectiveness and performance in real - time control.
### Main contributions
1. **Propose a model - free, real - time, and data - efficient algorithm** for solving the H∞ control problem of linear discrete - time systems.
2. **Reduce computational complexity** from O(q^3) to O(q^2).
3. **Discuss the properties of the algorithm and prove its convergence**.
4. **Apply the algorithm to the AMoD system** and show its application effect in actual scenarios.
### Research background
The paper first reviews the research background of the AMoD system and emphasizes the advantages of autonomous vehicles (AVs) in reducing operating costs and improving user experience. However, the AMoD system may experience supply - demand imbalance without proper control, so effective rebalancing strategies are needed to optimize vehicle scheduling.
### Methods and techniques
The paper adopts a Q - learning - based Reinforcement Learning method to optimize the control strategy through online learning. The specific steps are as follows:
1. **Problem modeling**: Model the AMoD system as a linear discrete - time dynamic system and define the system's state, control input, and external perturbation.
2. **Q - learning algorithm**: Propose a model - free Q - learning algorithm to update the parameters of the action network and the evaluation network through online learning.
3. **Algorithm implementation**: Describe in detail the online implementation process of the algorithm and analyze its properties and convergence.
### Experimental verification
The paper verifies the effectiveness of the proposed algorithm through simulation experiments. In the experiment, a 12 - hour historical travel data set was used, and each time step was set to 2 minutes, with a total of 360 iterations. The experimental results show that the proposed algorithm performs excellently in real - time control of the AMoD system and can effectively optimize vehicle scheduling and reduce waiting time.
### Conclusion
The paper successfully proposes a model - free, real - time, and data - efficient Q - learning algorithm for solving the H∞ control problem of linear discrete - time systems and applies it to the AMoD system. The experimental results verify the effectiveness and superiority of the algorithm in practical applications.