Prediction Horizon-Varying Model Predictive Control (MPC) for Autonomous Vehicle Control

Zhenbin Chen,Jiaqin Lai,Peixin Li,Omar I. Awad,Yubing Zhu
DOI: https://doi.org/10.3390/electronics13081442
IF: 2.9
2024-04-12
Electronics
Abstract:The prediction horizon is a key parameter in model predictive control (MPC), which is related to the effectiveness and stability of model predictive control. In vehicle control, the selection of a prediction horizon is influenced by factors such as speed, path curvature, and target point density. To accommodate varying conditions such as road curvature and vehicle speed, we proposed a control strategy using the proximal policy optimization (PPO) algorithm to adjust the prediction horizon, enabling MPC to achieve optimal performance, and called it PPO-MPC. We established a state space related to the path information and vehicle state, regarded the prediction horizon as actions, and designed a reward function to optimize the policy and value function. We conducted simulation verifications at various speeds and compared them with an MPC with fixed prediction horizons. The simulation demonstrates that the PPO-MPC proposed in this article exhibits strong adaptability and trajectory tracking capability.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to dynamically adjust the prediction horizon in the control of autonomous vehicles according to different road curvatures and vehicle speeds, so as to improve the performance and adaptability of model predictive control (MPC). Specifically, the paper proposes a prediction - horizon - varying model predictive control (PPO - MPC) method based on the proximal policy optimization (PPO) algorithm, which optimizes the performance of MPC under different environmental conditions by adjusting the prediction horizon. ### Background and Problem Description - **Background**: With the development of autonomous driving technology, the effectiveness of trajectory - tracking control directly affects the overall performance and safety of vehicle operation. However, optimizing trajectory - tracking control remains a challenging task, especially in complex road environments. - **Problem**: Traditional MPC methods use a fixed prediction horizon, which may lead to a decline in control performance or instability when facing different road curvatures and vehicle speeds. Therefore, how to dynamically adjust the prediction horizon to adapt to different driving conditions has become an important research direction. ### Solution - **Method**: The paper proposes a method that combines the PPO algorithm and MPC, namely PPO - MPC. This method is implemented through the following steps: 1. **Establishment of State Space**: Define a state space \( S(t)=[c(t), v(t), \delta(t), \text{acc}(t), e(t), \text{cost}(t)] \) that contains path information and vehicle states, where \( c(t) \) represents the curvature of the reference trajectory, \( v(t) \) represents the speed, \( \delta(t) \) represents the steering angle, \( \text{acc}(t) \) represents the acceleration, \( e(t) \) represents the lateral error, and \( \text{cost}(t) \) represents the cost of the MPC system. 2. **Action Space**: Take the prediction horizon \( N_p \) as the action space and map it from the range of \([- 1,1]\) to the range of \([1, N_{\max}]\) through linear scaling to ensure that the output of the PPO algorithm meets the requirements of MPC. 3. **Design of Reward Function**: Design a reward function \( R(t)=w_1 e^{-(\lambda_1 |e_1|+\lambda_2 |e_2|+\lambda_3 |e_3|)}-w_2 H_1 - w_3 H_2 \), where \( e_1 \) is the lateral tracking deviation, \( e_2 \) is the longitudinal speed deviation, and \( H_1 \) and \( H_2 \) represent the smoothness and stability indicators of the control output respectively. The reward function aims to balance smooth driving and an acceptable tracking error range. ### Experimental Verification - **Simulation Results**: The paper verifies the performance of the PPO - MPC method at different vehicle speeds through simulation and compares it with MPC with a fixed prediction horizon. The results show that PPO - MPC exhibits stronger adaptability and trajectory - tracking ability under different conditions. ### Conclusion The PPO - MPC method proposed in the paper can dynamically adjust the prediction horizon, thereby achieving better control performance and stability under different road curvatures and vehicle speeds and improving the trajectory - tracking ability of autonomous vehicles.