Rethinking Safe Policy Learning for Complex Constraints Satisfaction: A Glimpse in Real-Time Security Constrained Economic Dispatch Integrating Energy Storage Units

Jianxiong Hu,Yujian Ye,Yizhi Wu,Peilin Zhao,Liu
DOI: https://doi.org/10.1109/tpwrs.2024.3419894
IF: 7.326
2024-01-01
IEEE Transactions on Power Systems
Abstract:Reinforcement learning (RL) for real-time security constrained economic dispatch (RT-SCED) problems have been the subject of significant research interest in recent years. However, ordinary RL approaches struggle to ensure satisfaction of system- and device-wise constraints, having to penalize constraint violations individually. With increasing penetration of renewable energy sources, large-scale energy storage integration is witnessed, driven by their ability to mitigate RES intermittency. This gives rise to the need of time-coupling constraint satisfaction in the RT-SCED problems. Existing safe RL methods either rectify unsafe actions at each time step with a safety layer, which may lead to sub-optimal actions devised at the boundary of feasible space, and may violate time-coupling constraints; or construct a safety evaluation model, which may violate single-step constraints. To address these limitations, this paper proposes a novel safe deep RL method, featuring safety exploration and safety optimization modules, facilitating comprehensive satisfaction of single-step and time-coupling constraints. Furthermore, the policy network features a residual network architecture and allows direct computation of real-value dispatch of all controllable resources, adapting to their distinct power output ranges. Case studies validate the effectiveness of the proposed method in cost efficiency, operational security, computational and scalability performance, compared to state-of-the-art model-driven and data-driven baseline methods, on the IEEE 39-bus and 118-bus test systems.
What problem does this paper attempt to address?