Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems

Qionghua Liao,Guilong Li,Jiajie Yu,Ziyuan Gu,Wei Ma
2024-07-30
Abstract:With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. Existing literature largely overlooks the interactions between power grid security and traffic efficiency. In view of this, we study the en-route charging station (CS) recommendation problem for EVs in dynamically coupled transportation-power systems. The system-level objective is to maximize the overall traffic efficiency while ensuring the safety of the power grid. This problem is for the first time formulated as a constrained Markov decision process (CMDP), and an online prediction-assisted safe reinforcement learning (OP-SRL) method is proposed to learn the optimal and secure policy by extending the PPO method. To be specific, we mainly address two challenges. First, the constrained optimization problem is converted into an equivalent unconstrained optimization problem by applying the Lagrangian method. Second, to account for the uncertain long-time delay between performing CS recommendation and commencing charging, we put forward an online sequence-to-sequence (Seq2Seq) predictor for state augmentation to guide the agent in making forward-thinking decisions. Finally, we conduct comprehensive experimental studies based on the Nguyen-Dupuis network and a large-scale real-world road network, coupled with IEEE 33-bus and IEEE 69-bus distribution systems, respectively. Results demonstrate that the proposed method outperforms baselines in terms of road network efficiency, power grid safety, and EV user satisfaction. The case study on the real-world network also illustrates the applicability in the practical context.
Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The main problem this paper attempts to address is: In a dynamically coupled traffic-power system, how to recommend suitable charging stations for electric vehicles (EVs) on the move to maximize overall traffic efficiency while ensuring the safety of the power system. Specifically, the paper focuses on the increasingly close connection between traffic networks and power networks through charging stations as the number of electric vehicles continues to rise. With the growth in charging demand, this poses challenges to both networks, particularly the interaction between grid safety and traffic efficiency. Most existing studies overlook the interaction between grid safety and traffic efficiency, whereas this paper formalizes the problem for the first time as a Constrained Markov Decision Process (CMDP) and proposes an Online Prediction-assisted Safe Reinforcement Learning (OP-SRL) method to learn optimal and safe policies. ### Main Problem Breakdown: 1. **System-level Objective**: Maximize overall traffic efficiency while ensuring the safe operation of the power system. - **Traffic Efficiency**: Measured by minimizing the total travel time of all vehicles (including EVs that need charging and other vehicles that do not) within a certain period. - **Power System Safety**: Measured by the voltage deviation at grid nodes, i.e., the difference between the actual operating voltage and the nominal voltage should be as small as possible. 2. **Main Challenges**: - **Constrained Optimization Problem**: How to incorporate constraints in the sequential decision-making process, as traditional reinforcement learning methods only focus on long-term goals. - **Uncertain Long Delays**: There are uncertain long delays between recommending a charging station and starting to charge, which may lead to misleading policy outcomes and unstable training. ### Solution: - **Lagrangian Method**: Converts the constrained optimization problem into an equivalent unconstrained optimization problem and extends the Proximal Policy Optimization (PPO) method by introducing a cost critic and Lagrangian multipliers to handle constraints. - **Online Sequence-to-Sequence (Seq2Seq) Predictor**: Used for state augmentation by predicting future charging demand at charging stations, providing forward-looking information to help the agent make more foresighted decisions. ### Experimental Validation: - **Experimental Setup**: Experiments are conducted based on the Nguyen-Dupuis network and large-scale real road networks, combined with IEEE 33-node and IEEE 69-node distribution systems. - **Performance Evaluation**: Three performance metrics are designed to evaluate road network efficiency, power system safety, and EV user satisfaction. Through these methods and experiments, the paper demonstrates the effectiveness of the proposed OP-SRL method in improving road network efficiency, ensuring power system safety, and enhancing EV user satisfaction.