Abstract:The integrated community energy system (ICES) has emerged as a promising solution for enhancing the efficiency of the distribution system by effectively coordinating multiple energy sources. However, the operational optimization of ICES is hindered by the physical constraints of heterogeneous networks including electricity, natural gas, and heat. These challenges are difficult to address due to the non-linearity of network constraints and the high complexity of multi-network coordination. This paper, therefore, proposes a novel Safe Reinforcement Learning (SRL) algorithm to optimize the multi-network constrained operation problem of ICES. Firstly, a comprehensive ICES model is established considering integrated demand response (IDR), multiple energy devices, and network constraints. The multi-network operational optimization problem of ICES is then presented and reformulated as a constrained Markov Decision Process (C-MDP) accounting for violating physical network constraints. The proposed novel SRL algorithm, named Primal-Dual Twin Delayed Deep Deterministic Policy Gradient (PD-TD3), solves the C-MDP by employing a Lagrangian multiplier to penalize the multi-network constraint violation, ensuring that violations are within a tolerated range and avoid over-conservative strategy with a low reward at the same time. The proposed algorithm accurately estimates the cumulative reward and cost of the training process, thus achieving a fair balance between improving profits and reducing constraint violations in a privacy-protected environment with only partial information. A case study comparing the proposed algorithm with benchmark RL algorithms demonstrates the computational performance in increasing total profits and alleviating the network constraint violations.

Sensor Activation Policy Optimization for Opacity Enforcement Based on Reinforcement Learning

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Planning with Probabilistic Opacity and Transparency: A Computational Model of Opaque/Transparent Observations

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

Adversarial Constrained Policy Optimization: Improving Constrained Reinforcement Learning by Adapting Budgets

Optimal Policies Search for Sensor Management

Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement Learning

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Optimal Observation Policy of Fault Diagnosis: A Reinforcement Learning Approach

Optimal Proactive Eavesdropping Scheme Based on Stackelberg Game Framework Against State-Secrecy Encoding: A Deep Reinforcement Learning Approach

Minimization of Sensor Activation in Discrete-Event Systems with Control Delays and Observation Delays

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Multi-Network Constrained Operational Optimization in Community Integrated Energy Systems: A Safe Reinforcement Learning Approach

Adversarial Policy Optimization in Deep Reinforcement Learning

Supervisor synthesis for opacity enforcement in partially observed discrete event systems

State-wise Constrained Policy Optimization

How to Re-enable PDE Loss for Physical Systems Modeling Under Partial Observation

Multi-objective Sensor Management Method Based on Twin Delayed Deep Deterministic policy gradient algorithm