Abstract:Resource management challenges frequently manifest in systems and networks as tough online decision tasks, for which the proper solution is dependent on an understanding of the workload and environment and facilitates smooth use of mobile edge and cloud resources. Due to the geographical dispersion of resources, constrained resource capacity, unpredictable nature of tasks, and network hierarchy present in such contexts, it is difficult to efficiently schedule jobs in edge environments. Unfortunately, existing heuristic-based methods lack generality and fast adaptability and thus cannot optimally solve such problems. The advantage actor–critic (A2C) method, on the one hand, can quickly adapt to dynamic circumstances based on relatively few data, and deep reinforcement learning (DRL) agents can on the other hand rapidly learn from their experience of environmental interactions to make better judgments. Therefore, we present an A2C-DRL real-time task scheduling technique for stochastic edge–cloud environments that enables decentralized learning and simultaneous work scheduling across multiple servers. With the aim of producing efficient scheduling decisions, we develop reward values for various resources and model the update policy, server resource scheduling method, and policy learning method. The model is adaptive and includes various hyperparameters that can be adjusted in accordance with the application requirements. We evaluate the load balancing capability of the model by introducing a load balancing factor. Experiments on real datasets show that the proposed A2C-DRL method outperforms seven state-of-the-art algorithms in terms of the reward value, task rejection, and the load balancing factor.

Averaged-A3C for Asynchronous Deep Reinforcement Learning.

Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup

Recursive Least Squares Advantage Actor-Critic Algorithms

Implementation of value based curiosity mechanism in Reinforcement Learning algorithm based on A3C

Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning

Double A3C: Deep Reinforcement Learning on OpenAI Gym Games

Deep Reinforcement Learning with Importance Weighted A3C for QoE enhancement in Video Delivery Services

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

An Advanced Actor-Critic Algorithm for Training Video Game AI

Applying Online Expert Supervision in Deep Actor-Critic Reinforcement Learning.

A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

A2C-DRL: Dynamic Scheduling for Stochastic Edge-Cloud Environments Using A2C and Deep Reinforcement Learning

Average-Reward Reinforcement Learning with Trust Region Methods

Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

Generalizing soft actor-critic algorithms to discrete action spaces

A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence

Optimal Elevator Group Control via Deep Asynchronous Actor–Critic Learning

RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search