Abstract:The rapid urbanization and increasing use of distributed renewable energy resources have imposed a significant burden on power networks. Smart buildings equipped with artificial intelligence technology can play a pivotal role in energy management, ultimately enhancing energy efficiency and voltage quality. However, ensuring voltage stability within large-scale smart building systems presents challenges due to the coexistence of diverse energy sources and the fluctuating nature of renewable energy. This paper proposes a safe multi-energy management framework achieved by online decentralized execution and centralized training for large scale smart buildings in distribution networks. The energy management problem is formulated as a safety-augmented Markov decision process, presenting intractability for dynamic programming due to its extensive continuous state space. To solve this issue and improve the convergence speed and training process stability, a safety-augmented constrained multi-agent reinforcement learning algorithm based on reward extrapolation is proposed. In this algorithm, hazard values are introduced to enhance non-safe multi-agent reinforcement learning algorithms and meet safety constraints. A novel reward network is designed by imitating expert underlying intentions to ensure the rationality of the reward function for multi-objective tasks. Additionally, the loss function for estimating the Q-network is redesigned during training process to guarantee effective convergence. Theoretical analysis is conducted to provide the convergence guarantee. Numerical case studies based on actual data are performed to validate the effectiveness and scalability of our approach, showing that smart buildings can achieve superior energy management performance while ensuring voltage safety for distribution networks. The source code of the proposed algorithm will be available at https://github.com/SYiyun/CMARL-EX.

Multi-Agent Q-Value Mixing Network with Covariance Matrix Adaptation Strategy for the Voltage Regulation Problem

Target-Value-Competition-Based Multi-Agent Deep Reinforcement Learning Algorithm for Distributed Nonconvex Economic Dispatch

Hybrid Multi-Agent Reinforcement Learning for Electric Vehicle Resilience Control Towards a Low-Carbon Transition

Multi-Agent Deep Reinforcement Learning for Voltage Control with Coordinated Active and Reactive Power Optimization

Robust Regional Coordination of Inverter-Based Volt/Var Control Via Multi-Agent Deep Reinforcement Learning

Large-scale deep reinforcement learning method for energy management of power supply units considering regulation mileage payment

Population-based Multi-agent Evaluation for Large-scale Voltage Control.

Deep Reinforcement Learning Based Coordinated Voltage Control in Smart Distribution Network

Energy Management Based on Safe Multi-Agent Reinforcement Learning for Smart Buildings in Distribution Networks

Multi-Agent Reinforcement Learning with Safety Layer for Active Voltage Control

Multi Agent Safe Graph Reinforcement Learning for PV Inverter s Based Real-Time De centralized Volt/Var Control in Zoned Distribution Networks

Consensus Multi-Agent Reinforcement Learning for Volt-VAR Control in Power Distribution Networks

Multiagent-Based Reinforcement Learning for Optimal Reactive Power Dispatch.

Learning Multi-Agent Cooperation via Considering Actions of Teammates

Deep Reinforcement Learning Based Volt-VAR Optimization in Smart Distribution Systems

Secondary Voltage Collaborative Control of Distributed Energy System via Multi-Agent Reinforcement Learning

A Scalable Network-Aware Multi-Agent Reinforcement Learning Framework for Decentralized Inverter-based Voltage Control

Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization

Online Multi-Agent Reinforcement Learning for Decentralized Inverter-Based Volt-VAR Control

Decentralized multi-objective cloud energy storage operation control with deep reinforcement learning

Multi-timescale voltage control for distribution system based on multi-agent deep reinforcement learning