Abstract:Deep Actor-Critic algorithms, which combine Actor-Critic with deep neural network (DNN), have been among the most prevalent reinforcement learning algorithms for decision-making problems in simulated environments. However, the existing deep Actor-Critic algorithms are still not mature to solve realistic problems with non-convex stochastic constraints and high cost to interact with the environment. In this paper, we propose a single-loop deep Actor-Critic (SLDAC) algorithmic framework for general constrained reinforcement learning (CRL) problems. In the actor step, the constrained stochastic successive convex approximation (CSSCA) method is applied to handle the non-convex stochastic objective and constraints. In the critic step, the critic DNNs are only updated once or a few finite times for each iteration, which simplifies the algorithm to a single-loop framework (the existing works require a sufficient number of updates for the critic step to ensure a good enough convergence of the inner loop for each iteration). Moreover, the variance of the policy gradient estimation is reduced by reusing observations from the old policy. The single-loop design and the observation reuse effectively reduce the agent-environment interaction cost and computational complexity. In spite of the biased policy gradient estimation incurred by the single-loop design and observation reuse, we prove that the SLDAC with a feasible initial point can converge to a Karush-Kuhn-Tuker (KKT) point of the original problem almost surely. Simulations show that the SLDAC algorithm can achieve superior performance with much lower interaction cost.

Single-Loop Federated Actor-Critic across Heterogeneous Environments

Finite-Time Analysis of Decentralized Single-Timescale Actor-Critic

Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments

Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

A fuzzy Actor–Critic reinforcement learning network

DearFSAC: An Approach to Optimizing Unreliable Federated Learning via Deep Reinforcement Learning

A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence

Towards Cost-Efficient Federated Multi-agent RL with Learnable Aggregation

A Novel Federated Reinforcement Learning Algorithm with Historical Model Update Momentum

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

A Robust Mean-Field Actor-Critic Reinforcement Learning Against Adversarial Perturbations on Agent States

A Fair Federated Learning Framework with Reinforcement Learning.

Communication-Efficient Soft Actor-Critic Policy Collaboration via Regulated Segment Mixture

FedAEB: Deep Reinforcement Learning Based Joint Client Selection and Resource Allocation Strategy for Heterogeneous Federated Learning

Federated Reinforcement Learning with Environment Heterogeneity

FedMC: Federated Reinforcement Learning on the Edge with Meta-Critic Networks

FedHQL: Federated Heterogeneous Q-Learning

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

CAESAR: Enhancing Federated RL in Heterogeneous MDPs through Convergence-Aware Sampling with Screening