Abstract:A novel method, the Pareto Envelope Augmented with Reinforcement Learning (PEARL), has been developed to address the challenges posed by multi-objective problems, particularly in the field of engineering where the evaluation of candidate solutions can be time-consuming. PEARL distinguishes itself from traditional policy-based multi-objective Reinforcement Learning methods by learning a single policy, eliminating the need for multiple neural networks to independently solve simpler sub-problems. Several versions inspired from deep learning and evolutionary techniques have been crafted, catering to both unconstrained and constrained problem domains. Curriculum Learning is harnessed to effectively manage constraints in these versions. PEARL's performance is first evaluated on classical multi-objective benchmarks. Additionally, it is tested on two practical PWR core Loading Pattern optimization problems to showcase its real-world applicability. The first problem involves optimizing the Cycle length and the rod-integrated peaking factor as the primary objectives, while the second problem incorporates the mean average enrichment as an additional objective. Furthermore, PEARL addresses three types of constraints related to boron concentration, peak pin burnup, and peak pin power. The results are systematically compared against conventional approaches. Notably, PEARL, specifically the PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating additional efforts from the algorithm designer, as opposed to a single optimization with scaled objectives. It also outperforms the classical approach across multiple performance metrics, including the Hyper-volume.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the multi - objective optimization challenges faced in the fuel loading pattern (LP) optimization of pressurized water reactors (PWRs). Specifically, the paper proposes a new method - Pareto Envelope Augmented with Reinforcement Learning (PEARL), aiming to solve multi - objective optimization problems in the engineering field, especially when evaluating candidate solutions takes a great deal of time. The main problems in the paper can be summarized as follows: 1. **Multi - objective optimization**: Traditional single - objective optimization methods cannot consider multiple optimization objectives simultaneously. For example, in the LP optimization of PWRs, it is necessary to optimize the cycle length (LC) and the rod - integrated peaking factor (F∆h) simultaneously, and sometimes other objectives such as average enrichment need to be considered as well. There may be conflicts among these objectives, so a method capable of handling multi - objective optimization is required. 2. **Constraints**: In the actual PWR design, there are various constraints, such as boron concentration (Cb), peak fuel rod burnup (Bumax) and peak fuel rod power (Fq). These constraints limit the space of feasible solutions and increase the difficulty of optimization. 3. **Computational efficiency**: Existing multi - objective optimization methods have high computational costs when dealing with large - scale combinatorial optimization problems, and it is difficult to find the optimal solution within a reasonable time. Therefore, an efficient method is needed to handle these problems. 4. **Algorithm generality**: Existing multi - objective optimization methods usually need to be adjusted for specific problems and lack generality. The method proposed in the paper aims to solve multiple sub - problems through a single strategy (policy), reduce the need for multiple neural networks, and improve the generality and applicability of the algorithm. To meet the above challenges, the paper proposes the PEARL algorithm and its variants, which effectively solve the multi - objective and constraint problems in PWR LP optimization through reinforcement learning (RL) and curriculum learning (CL) techniques. Specifically, the paper achieves this goal through the following aspects: - **Single - strategy learning**: Different from traditional methods, PEARL solves multi - objective optimization problems by learning a single strategy instead of using multiple neural networks to independently solve simple sub - problems. - **Non - uniformity penalty term**: Introduce a non - uniformity penalty term and trace vectors to ensure the diversity of the Pareto envelope. - **Reward mechanism**: Design a reward mechanism based on the preference vector and the sampling process to ensure that the algorithm can generate a well - distributed Pareto envelope in one run. - **Curriculum learning**: Use curriculum learning techniques to manage constraints, gradually transition from simple tasks to complex tasks, and improve the performance of the algorithm. Through these innovations, the paper demonstrates the superior performance of PEARL in classical multi - objective benchmark tests and actual PWR LP optimization problems, and makes a systematic comparison with traditional stochastic optimization methods.

Multi-Objective Reinforcement Learning-based Approach for Pressurized Water Reactor Optimization

Multi-objective reinforcement learning-based approach for pressurized water reactor optimization

Physics-informed Reinforcement Learning optimization of PWR core loading pattern

Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning

Assessment of reinforcement learning algorithms for nuclear power plant fuel optimization

Combining Reinforcement Learning with Mathematical Programming: an Approach for Optimal Design of Heat Exchanger Networks

Multi-objective Optimization of Operating Parameters Based on Neural Network and Genetic Algorithm in the Blast Furnace

Reactor Optimization Benchmark by Reinforcement Learning

Multistep Criticality Search and Power Shaping in Microreactors with Reinforcement Learning

A Novel Multi-Objective Optimization Method for the Pressurized Reservoir in Hydraulic Robotics

Multi-objective optimization of thermal power and outlet steam temperature for a nuclear steam supply system with deep reinforcement learning

Design Optimization for Pressurized Water Reactor Using Improved Quantum Fish Swarm Algorithm and Intuitionistic Linguistic Decision-Making

Towards Pareto-optimal energy management in integrated energy systems: A multi-agent and multi-objective deep reinforcement learning approach

Design Optimization of Nuclear Fusion Reactor through Deep Reinforcement Learning

Multiobjective Genetic Algorithm Strategies for Burnable Poison Design of Pressurized Water Reactor

Pearl: A Production-ready Reinforcement Learning Agent

Optimal controller design for reactor core power stabilization in a pressurized water reactor: Applications of gold rush algorithm

Multi-Objective Deep Reinforcement Learning for Optimisation in Autonomous Systems

Possibilities of reinforcement learning for nuclear power plants: Evidence on current applications and beyond

A Safe Reinforcement Learning Algorithm for Supervisory Control of Power Plants