A Safe Reinforcement Learning Algorithm for Supervisory Control of Power Plants

Yixuan Sun,Sami Khairy,Richard B. Vilim,Rui Hu,Akshay J. Dave
2024-01-24
Abstract:Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-error interactions with the environment. It eliminates the need for explicitly modeling the environment's dynamics, which is potentially inaccurate. However, the direct imposition of state constraints in power plant control raises challenges for standard RL methods. To address this, we propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control. Our method employs Lagrangian relaxation to convert the constrained optimization problem into an unconstrained objective, where trainable Lagrange multipliers enforce the state constraints. Our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design.
Systems and Control,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of safe reinforcement learning in the supervisory control of nuclear power plants (NPPs). Specifically, it focuses on: 1. **Proposing a new safe reinforcement learning algorithm**: This algorithm is based on Proximal Policy Optimization (PPO). It transforms the constrained optimization problem into an unconstrained one using the Lagrangian relaxation method and introduces trainable Lagrange multipliers to ensure state constraints. 2. **Creating a physics-based learning environment**: By using a simplified model (SINDYc), an efficient learning environment is constructed to reduce the time required for simulation feedback, thereby accelerating the training process of the reinforcement learning agent. 3. **Implementing supervisory control in advanced nuclear reactors**: By training reinforcement learning agents, the control of advanced nuclear reactors during routine operational transients is achieved, ensuring that system states meet specific constraints, reducing equipment wear, and improving economic benefits. 4. **Achieving optimal performance**: In load-following operations, the proposed model demonstrates optimal control performance, reducing the total power variation by up to 50% compared to traditional methods. Through these methods, the paper aims to develop a reinforcement learning control strategy that can effectively handle complex operational conditions while ensuring safety and efficiency.