Deep Reinforcement Learning based Energy Management for Heavy Duty HEV considering Discrete-Continuous Hybrid Action Space

Zemin Eitan Liu,Yanfei Li,Quan Zhou,Yong Li,Bin Shuai,Hongming Xu,Min Hua,Guikun Tan,Lubing Xu
DOI: https://doi.org/10.1109/TTE.2024.3363650
IF: 6.519
2024-01-01
IEEE Transactions on Transportation Electrification
Abstract:To reduce the fuel consumption of heavy duty logistic vehicles (HDLVs), P2 parallel hybridization is a promising solution, and deep reinforcement learning (DRL) is a promising method to optimize energy management strategies (EMSs). However, the complicated discrete-continuous hybrid action space lying in the P2 system presents a challenge to achieve real-time optimal control. Thus, this paper proposes a novel DRL algorithm combining auto-tune soft actor-critic (ATSAC) with ordinal regression to optimize the engine torque output and gear shifting simultaneously. ATSAC can adjust the update frequency and learning rate of SAC automatically to improve the generalization and ordinal regression can convert discrete variables into samplings in continuous space to handle the hybrid action. Moreover, a multi-dimensional scenario-oriented driving cycle (SODC) is established through naturalistic driving big data (NDBD) as the training cycle to further improve the EMS generalization. By comprehensive comparison with the widely used twin-delayed deep deterministic policy gradient (TD3) based EMSs, ATSAC achieves significant improvement with 53.70% higher computational efficiency and 12.31% lower negative total reward (NTR) in the training process. Application analysis in unseen real-world driving scenarios shows that only ATSAC based EMS can obtain real-time optimal control in the testing process. Furthermore, the EMS trained through SODC obtains 81.73% lower NTR than the standard China World Transient Vehicle Cycle (CWTVC) which demonstrates that SODC can represent the real-world driving scenarios much more accurately than CWTVC, especially in low-speed high-load conditions which are crucial for HDLVs.
What problem does this paper attempt to address?