Abstract:Inverter-based Volt-VAR control (IB-VVC) can be simplified as a single-period optimization problem. However, recently DRL methods formulate IB-VVC as a Markov decision process (MDP) and solve it as a multi-period optimization problem. It complicates the IB-VVC problem and degrades the performance of deep reinforcement learning (DRL) algorithms. To avoid this, this paper formulates the inverter-based VVC as a one-step MDP and designs a single-period DRL algorithm to solve the problem. It simplifies the DRL approach considerably and accelerates the convergence rate as well as the control performance. Since VVC has two aims: eliminating voltage violations and minimizing power loss, those two objectives have different profiles. Existing DRL methods use one critic neural network to approximate the two objectives together without considering their special property. It increases the approximate difficulty of the critic neural networks in the training process. To alleviate it, we design a two-critic approach. It approximates the two objective functions by two critic neural networks separately. It has a better approximation capability, thus accelerating the convergence rate and improving the control performance of the DRL method further. Based on the single-period two-critic DRL (TC) approach, we design two DRL algorithms: 1) TC-DDPG with deterministic policy and 2) TC-SAC with stochastic policy. Further, we extend the TC-DRL to multi-agent TC to show it scales well for multi-agent DRL algorithms. Simulations conducted on 33-bus and 69-bus test distribution networks demonstrate the superiority of the proposed approach in both single-agent DRL algorithms and multi-agent DRL algorithms. A two-critic deep reinforcement learning (TC-DRL) approach for inverter-based volt-var control (IB-VVC) in active distribution networks is proposed in this paper. Considering two objectives of VVC, minimizing power loss and eliminating voltage violations, have different mathematical properties, we utilize two critics to approximate two objectives separately, which reduces the learning difficulties of each critic. The TC-DRL approach cooperates well with many actor-critic DRL algorithms for the centralized IB-VVC problems, and two centralized DRL algorithms were designed as examples. For decentralized IB-VVC, we extend the approach to a multi-agent TC-DRL approach and further simplify the multi-agent DRL approach with all agents sharing the same centralized two-critic. Extensive simulation experiments show that the proposed two centralized TC-DRL algorithms require fewer iteration times and return better results than the recent DRL algorithms, and the multi-agent TC-DRL algorithms work well for decentralized IB-VVC problems with different limited real-time measurement conditions.

Towards Variance Reduction for Reinforcement Learning of Industrial Decision-making Tasks: A Bi-Critic Based Demand-Constraint Decoupling Approach.

Towards Solving Industrial Sequential Decision-making Tasks under Near-predictable Dynamics via Reinforcement Learning: an Implicit Corrective Value Estimation Approach

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning

Development of Parametric Reinforcement Learning for different operation preferences

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

Resource Constrained Deep Reinforcement Learning

An efficient and adaptive design of reinforcement learning environment to solve job shop scheduling problem with soft actor-critic algorithm

Two-Critic Deep Reinforcement Learning for Inverter-based Volt-Var Control in Active Distribution Networks

Actor-Critic Reinforcement Learning with Phased Actor

Deployable Reinforcement Learning with Variable Control Rate

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors

A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence

Is High Variance Unavoidable in RL? A Case Study in Continuous Control

Model-Based Actor-Critic with Chance Constraint for Stochastic System

Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices

Variance Reduced Domain Randomization for Reinforcement Learning With Policy Gradient

Reducing Action Space: Reference-Model-Assisted Deep Reinforcement Learning for Inverter-based Volt-Var Control

The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning