Computationally Efficient Safe Reinforcement Learning for Power Systems

Daniel Tabas,Baosen Zhang
DOI: https://doi.org/10.48550/arXiv.2110.10333
2022-03-23
Abstract:We propose a computationally efficient approach to safe reinforcement learning (RL) for frequency regulation in power systems with high levels of variable renewable energy resources. The approach draws on set-theoretic control techniques to craft a neural network-based control policy that is guaranteed to satisfy safety-critical state constraints, without needing to solve a model predictive control or projection problem in real time. By exploiting the properties of robust controlled-invariant polytopes, we construct a novel, closed-form "safety-filter" that enables end-to-end safe learning using any policy gradient-based RL algorithm. We then apply the safety filter in conjunction with the deep deterministic policy gradient (DDPG) algorithm to regulate frequency in a modified 9-bus power system, and show that the learned policy is more cost-effective than robust linear feedback control techniques while maintaining the same safety guarantee. We also show that the proposed paradigm outperforms DDPG augmented with constraint violation penalties.
Systems and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve safe reinforcement learning (RL) for frequency regulation in power systems, especially in power systems with a high proportion of renewable energy resources. Specifically, the research aims to design a computationally efficient reinforcement learning method to ensure that key state constraints are met during the frequency regulation process while avoiding real - time solution of model predictive control or projection problems. Through this method, the safety of the system can be guaranteed while improving the control performance. ### Background of the Paper and Problem Description The power system is a typical example of critical infrastructure, where violation of operation constraints may lead to large - scale power outages, resulting in huge economic losses and casualties. With the continuous integration of renewable energy resources (such as wind energy, solar energy, etc.), it becomes particularly important to ensure that system states (such as generator frequency and bus voltage) remain within the safe area. Traditional controller design mainly relies on set - theory control techniques, which ensure that the system state is always within the safe area by calculating the robust controllable invariant set (RCI). However, these methods usually require simplifying assumptions, resulting in sub - optimal controller performance. For example: - Assume that the disturbance is bounded but arbitrary. - Limit the RCI to simple geometric objects, such as polyhedra or ellipsoids. - Need to select a linear control strategy, which will force a trade - off between performance and robustness. - Linearize the nonlinear system and assume that the linearization error is bounded. In addition, although data - driven methods can improve performance through learning, they usually require real - time solution of optimization problems, which may be too computationally expensive. Therefore, how to ensure the safety and efficiency of the reinforcement learning algorithm without increasing too much computational burden has become an urgent problem to be solved. ### Research Objectives This paper proposes a new method that combines the advantages of set - theory control and reinforcement learning to achieve safe control in frequency regulation. Specific objectives include: 1. **Ensure safety**: Ensure the safety of the control strategy by constructing a closed - form safety filter to map the output of the neural network to the current set of safe actions. 2. **Improve performance**: Use reinforcement learning to train a neural - network - based controller to improve control performance while maintaining safety. 3. **Computational efficiency**: Avoid solving optimization problems every time an action is executed, thereby reducing computational costs. Through the above methods, the author hopes to significantly improve the performance of power system frequency regulation under the premise of ensuring safety, and does not need to perform complex real - time optimization calculations at each decision - making.