Abstract:Production cost minimization (PCM) simulation is commonly employed for assessing the operational efficiency, economic viability, and reliability, providing valuable insights for power system planning and operations. However, solving a PCM problem is time-consuming, consisting of numerous binary variables for simulation horizon extending over months and years. This hinders rapid assessment of modern energy systems with diverse planning requirements. Existing methods for accelerating PCM tend to sacrifice accuracy for speed. In this paper, we propose a stable relay learning optimization (s-RLO) approach within the Branch and Bound (B&B) algorithm. The proposed approach offers rapid and stable performance, and ensures optimal solutions. The two-stage s-RLO involves an imitation learning (IL) phase for accurate policy initialization and a reinforcement learning (RL) phase for time-efficient fine-tuning. When implemented on the popular SCIP solver, s-RLO returns the optimal solution up to 2 times faster than the default relpscost rule and 1.4 times faster than IL, or exhibits a smaller gap at the predefined time limit. The proposed approach shows stable performance, reducing fluctuations by approximately 50% compared with IL. The efficacy of the proposed s-RLO approach is supported by numerical results.

What problem does this paper attempt to address?

The paper aims to address the computational efficiency issues in Production Cost Minimization (PCM) simulations for power systems. Specifically, the study focuses on the following key points: 1. **Problem Background**: PCM simulations are typically used to evaluate the operational efficiency, economic feasibility, and reliability of power systems, which are of significant value for power system planning and operation. However, actual PCM problems often involve a large number of binary variables, especially when the time span reaches monthly or yearly scales, leading to substantial time consumption in solving such problems. 2. **Limitations of Existing Methods**: Existing methods to accelerate PCM solutions often sacrifice accuracy for speed. These methods include techniques based on binary reduction, relaxation, and partition, but they may result in inaccurate outcomes. 3. **Proposed Solution**: To address the above issues, the paper proposes a method called Stable Relay Learning Optimization (s-RLO). This method works within the Branch and Bound (B&B) algorithm framework and combines Imitation Learning (IL) and Reinforcement Learning (RL) techniques. - **Imitation Learning Phase**: Initially, the strategy network is initialized by imitating the behavior of the default relpscost rule in the SCIP solver, quickly forming a preliminary strategy network. - **Reinforcement Learning Phase**: Subsequently, the strategy network is further optimized through reinforcement learning to improve solving speed. This phase refines the strategy network through continuous interaction with the environment. 4. **Main Contributions**: - Enhanced the traditional B&B algorithm based on the open-source SCIP solver, significantly speeding up the PCM problem-solving process. - Designed a two-phase s-RLO method that combines imitation learning and reinforcement learning to acquire and improve variable selection strategies. - Achieved fast and optimal solution results. Additionally, the s-RLO framework demonstrates consistency and stability, maintaining performance even in the face of environmental changes. 5. **Case Analysis**: Experiments on a PJM 5-bus system show that the s-RLO method can significantly reduce solving time. For example, over a 336-hour time span, the average solving time was reduced by approximately 50% compared to relpscost; for 720 and 1440-hour time spans, s-RLO further shortened the solving time based on IL. In summary, the paper proposes a new learning optimization method, s-RLO, aimed at addressing the computational efficiency issues of large-scale PCM problems while ensuring result accuracy. This method leverages the advantages of imitation learning and reinforcement learning, significantly improving solving speed while maintaining solution quality.

Stable Relay Learning Optimization Approach for Fast Power System Production Cost Minimization Simulation

Coordinated Emergency Load Shedding Control Optimization Algorithm for Economic Cost and Accident Assessment

A sparse recursive convolution method for power systems stochastic production simulation

Mixed-Integer Linear Programming Model and Constraints Reduction Methods for Security-Constrained Unit Commitment

Physics-Informed Reinforcement Learning for Real-Time Optimal Power Flow with Renewable Energy Resources

An Efficient Parallel Sequential Approach for Transient Stability Emergency Control of Large-Scale Power System

A Fast Dynamic Optimization Approach for Transient Stability Emergency Control

Fast Solving Method for Two-Stage Multi-Period Robust Optimization of Active and Reactive Power Coordination in Active Distribution Networks

A Safe DRL Method for Fast Solution of Real-Time Optimal Power Flow

Decomposed optimization method over large-scale power system based on price response function

Safe deep reinforcement learning for real-time AC optimal power flow: A near-optimal solution

Robust preventive and corrective security-constrained OPF for worst contingencies with the adoption of VPP: A safe reinforcement learning approach

Reactive power optimization via deep transfer reinforcement learning for efficient adaptation to multiple scenarios

On Fast-Converged Deep Reinforcement Learning for Optimal Dispatch of Large-Scale Power Systems under Transient Security Constraints

Distributed Deep Reinforcement Learning-based Approach for Fast Preventive Control Considering Transient Stability Constraints

Learning-based Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation

Low-carbon Economic Dispatch of Electricity-Heat-Gas Integrated Energy Systems Based on Deep Reinforcement Learning

A Multi-Stage Solution Approach for Dynamic Reactive Power Optimization

Combination optimization method of grid sections based on deep reinforcement learning with accelerated convergence speed

Efficient learning of power grid voltage control strategies via model-based deep reinforcement learning

Optimizing Load Scheduling in Power Grids Using Reinforcement Learning and Markov Decision Processes