RL-Driven MPPI: Accelerating Online Control Laws Calculation with Offline Policy

Yue Qu,Hongqing Chu,Shuhua Gao,Jun Guan,Haoqi Yan,Liming Xiao,Shengbo Eben Li,Jingliang Duan
DOI: https://doi.org/10.1109/tiv.2023.3348134
IF: 8.2
2024-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:Model Predictive Path Integral (MPPI) is a recognized sampling-based approach for finite horizon optimal control problems. However, the efficacy and computational efficiency of prevailing MPPI methods are heavily reliant on the quality of rollouts. This is problematic because it is hard to sample a low-cost trajectory using random control sequences, thereby leading to inferior performance and computational efficiency, especially under constrained resources. To address this issue, we propose a data-efficient MPPI method called reinforcement learning-driven MPPI (RL-driven MPPI), which significantly reduces the dependency on the quantity and quality of samples. RL-driven MPPI employs an offline-online policy learning scheme, where the offline policy learned by RL serves as the initial solution and the initial rollout generator of MPPI, effectively combining the strengths of both RL and MPPI. The rollouts generated by RL typically correspond to a lower cost-to-go compared to random sampling, which significantly boosts the sample efficiency and convergence speed of MPPI. Moreover, the value function learned by RL offers an accurate estimation for infinite-horizon cost-to-go, enabling it to serve as a terminal term for the cost criteria of MPPI. This approach empowers MPPI to approximate an infinite-horizon cost with a shorter prediction horizon, thus enhancing real-time performance at each time step. An unmanned aerial vehicle control task is conducted to evaluate the proposed method. Results indicate that the proposed RL-driven MPPI method exhibits superior control performance and sample efficiency.
What problem does this paper attempt to address?