Multi-Objective Reinforcement Learning-based Approach for Pressurized Water Reactor Optimization

Paul Seurin,Koroush Shirvan
2024-03-16
Abstract:A novel method, the Pareto Envelope Augmented with Reinforcement Learning (PEARL), has been developed to address the challenges posed by multi-objective problems, particularly in the field of engineering where the evaluation of candidate solutions can be time-consuming. PEARL distinguishes itself from traditional policy-based multi-objective Reinforcement Learning methods by learning a single policy, eliminating the need for multiple neural networks to independently solve simpler sub-problems. Several versions inspired from deep learning and evolutionary techniques have been crafted, catering to both unconstrained and constrained problem domains. Curriculum Learning is harnessed to effectively manage constraints in these versions. PEARL's performance is first evaluated on classical multi-objective benchmarks. Additionally, it is tested on two practical PWR core Loading Pattern optimization problems to showcase its real-world applicability. The first problem involves optimizing the Cycle length and the rod-integrated peaking factor as the primary objectives, while the second problem incorporates the mean average enrichment as an additional objective. Furthermore, PEARL addresses three types of constraints related to boron concentration, peak pin burnup, and peak pin power. The results are systematically compared against conventional approaches. Notably, PEARL, specifically the PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating additional efforts from the algorithm designer, as opposed to a single optimization with scaled objectives. It also outperforms the classical approach across multiple performance metrics, including the Hyper-volume.
Machine Learning,Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the multi - objective optimization challenges faced in the fuel loading pattern (LP) optimization of pressurized water reactors (PWRs). Specifically, the paper proposes a new method - Pareto Envelope Augmented with Reinforcement Learning (PEARL), aiming to solve multi - objective optimization problems in the engineering field, especially when evaluating candidate solutions takes a great deal of time. The main problems in the paper can be summarized as follows: 1. **Multi - objective optimization**: Traditional single - objective optimization methods cannot consider multiple optimization objectives simultaneously. For example, in the LP optimization of PWRs, it is necessary to optimize the cycle length (LC) and the rod - integrated peaking factor (F∆h) simultaneously, and sometimes other objectives such as average enrichment need to be considered as well. There may be conflicts among these objectives, so a method capable of handling multi - objective optimization is required. 2. **Constraints**: In the actual PWR design, there are various constraints, such as boron concentration (Cb), peak fuel rod burnup (Bumax) and peak fuel rod power (Fq). These constraints limit the space of feasible solutions and increase the difficulty of optimization. 3. **Computational efficiency**: Existing multi - objective optimization methods have high computational costs when dealing with large - scale combinatorial optimization problems, and it is difficult to find the optimal solution within a reasonable time. Therefore, an efficient method is needed to handle these problems. 4. **Algorithm generality**: Existing multi - objective optimization methods usually need to be adjusted for specific problems and lack generality. The method proposed in the paper aims to solve multiple sub - problems through a single strategy (policy), reduce the need for multiple neural networks, and improve the generality and applicability of the algorithm. To meet the above challenges, the paper proposes the PEARL algorithm and its variants, which effectively solve the multi - objective and constraint problems in PWR LP optimization through reinforcement learning (RL) and curriculum learning (CL) techniques. Specifically, the paper achieves this goal through the following aspects: - **Single - strategy learning**: Different from traditional methods, PEARL solves multi - objective optimization problems by learning a single strategy instead of using multiple neural networks to independently solve simple sub - problems. - **Non - uniformity penalty term**: Introduce a non - uniformity penalty term and trace vectors to ensure the diversity of the Pareto envelope. - **Reward mechanism**: Design a reward mechanism based on the preference vector and the sampling process to ensure that the algorithm can generate a well - distributed Pareto envelope in one run. - **Curriculum learning**: Use curriculum learning techniques to manage constraints, gradually transition from simple tasks to complex tasks, and improve the performance of the algorithm. Through these innovations, the paper demonstrates the superior performance of PEARL in classical multi - objective benchmark tests and actual PWR LP optimization problems, and makes a systematic comparison with traditional stochastic optimization methods.