Multi-service Provision for Electric Vehicles in Power-Transportation Networks Towards a Low-Carbon Transition: A Hierarchical and Hybrid Multi-Agent Reinforcement Learning Approach

Dawei Qiu,Yi Wang,Mingyang Sun,Goran Strbac
DOI: https://doi.org/10.1016/j.apenergy.2022.118790
IF: 11.2
2022-01-01
Applied Energy
Abstract:In order to achieve the target of carbon peak and carbon neutrality, electric vehicles (EVs) have increasingly received a prominent interest to electrify the transportation sector due to their advantages of mobility and flexibility on handling complicated transportation and power networks. However, it is still challenging to realize the significant potential of EVs towards an emerging low-carbon transition. Previous works have focused on vehicle-to-grid (V2G) technology that allows for an increased utilization of EVs to make arbitrage by the temporal differentials of electricity prices. Nevertheless, the economic potential of EVs flexibility may not be fully exploited lacking an appropriate business model. This paper addresses this challenge by developing a coupled power-transportation network for cooperative EVs to optimize the provision of multiple inter-dependent services, including charging service, demand management service, carbon intensity service, and balancing service. In order to unlock this value, the EVs operation problem has already been tackled using model-based optimization approaches, which may raise privacy issues since the requirement for global information and also can be time consuming due to the high variability of transportation and power networks. In this paper, we propose a model-free hierarchical and hybrid multi-agent reinforcement learning method to learn the routing and scheduling decisions of EVs in a coupled power-transportation network with the objective of optimizing multi-service provisions. To this end, EVs do not reply on any knowledge of the simulated environment and are capable of handling system uncertainties via the learning process. Extensive case studies based on a 15-bus radial power distribution network and a 9-node 12-edge transportation network are developed to show that the proposed method outperforms the conventional learning algorithms in terms of policy quality and convergence speed. Finally, the generalizability and scalability are also investigated for different environment circumstances and EV numbers.
What problem does this paper attempt to address?