Abstract:Production cost minimization (PCM) simulation is commonly employed for assessing the operational efficiency, economic viability, and reliability, providing valuable insights for power system planning and operations. However, solving a PCM problem is time-consuming, consisting of numerous binary variables for simulation horizon extending over months and years. This hinders rapid assessment of modern energy systems with diverse planning requirements. Existing methods for accelerating PCM tend to sacrifice accuracy for speed. In this paper, we propose a stable relay learning optimization (s-RLO) approach within the Branch and Bound (B&B) algorithm. The proposed approach offers rapid and stable performance, and ensures optimal solutions. The two-stage s-RLO involves an imitation learning (IL) phase for accurate policy initialization and a reinforcement learning (RL) phase for time-efficient fine-tuning. When implemented on the popular SCIP solver, s-RLO returns the optimal solution up to 2 times faster than the default relpscost rule and 1.4 times faster than IL, or exhibits a smaller gap at the predefined time limit. The proposed approach shows stable performance, reducing fluctuations by approximately 50% compared with IL. The efficacy of the proposed s-RLO approach is supported by numerical results.
What problem does this paper attempt to address?
The paper aims to address the computational efficiency issues in Production Cost Minimization (PCM) simulations for power systems. Specifically, the study focuses on the following key points:
1. **Problem Background**: PCM simulations are typically used to evaluate the operational efficiency, economic feasibility, and reliability of power systems, which are of significant value for power system planning and operation. However, actual PCM problems often involve a large number of binary variables, especially when the time span reaches monthly or yearly scales, leading to substantial time consumption in solving such problems.
2. **Limitations of Existing Methods**: Existing methods to accelerate PCM solutions often sacrifice accuracy for speed. These methods include techniques based on binary reduction, relaxation, and partition, but they may result in inaccurate outcomes.
3. **Proposed Solution**: To address the above issues, the paper proposes a method called Stable Relay Learning Optimization (s-RLO). This method works within the Branch and Bound (B&B) algorithm framework and combines Imitation Learning (IL) and Reinforcement Learning (RL) techniques.
- **Imitation Learning Phase**: Initially, the strategy network is initialized by imitating the behavior of the default relpscost rule in the SCIP solver, quickly forming a preliminary strategy network.
- **Reinforcement Learning Phase**: Subsequently, the strategy network is further optimized through reinforcement learning to improve solving speed. This phase refines the strategy network through continuous interaction with the environment.
4. **Main Contributions**:
- Enhanced the traditional B&B algorithm based on the open-source SCIP solver, significantly speeding up the PCM problem-solving process.
- Designed a two-phase s-RLO method that combines imitation learning and reinforcement learning to acquire and improve variable selection strategies.
- Achieved fast and optimal solution results. Additionally, the s-RLO framework demonstrates consistency and stability, maintaining performance even in the face of environmental changes.
5. **Case Analysis**: Experiments on a PJM 5-bus system show that the s-RLO method can significantly reduce solving time. For example, over a 336-hour time span, the average solving time was reduced by approximately 50% compared to relpscost; for 720 and 1440-hour time spans, s-RLO further shortened the solving time based on IL.
In summary, the paper proposes a new learning optimization method, s-RLO, aimed at addressing the computational efficiency issues of large-scale PCM problems while ensuring result accuracy. This method leverages the advantages of imitation learning and reinforcement learning, significantly improving solving speed while maintaining solution quality.