Abstract:The growing IoT landscape requires effective server deployment strategies to meet demands including real-time processing and energy efficiency. This is complicated by heterogeneous, dynamic applications and servers. To address these challenges, we propose ReinFog, a modular distributed software empowered with Deep Reinforcement Learning (DRL) for adaptive resource management across edge/fog and cloud environments. ReinFog enables the practical development/deployment of various centralized and distributed DRL techniques for resource management in edge/fog and cloud computing environments. It also supports integrating native and library-based DRL techniques for diverse IoT application scheduling objectives. Additionally, ReinFog allows for customizing deployment configurations for different DRL techniques, including the number and placement of DRL Learners and DRL Workers in large-scale distributed systems. Besides, we propose a novel Memetic Algorithm for DRL Component (e.g., DRL Learners and DRL Workers) Placement in ReinFog named MADCP, which combines the strengths of Genetic Algorithm, Firefly Algorithm, and Particle Swarm Optimization. Experiments reveal that the DRL mechanisms developed within ReinFog have significantly enhanced both centralized and distributed DRL techniques implementation. These advancements have resulted in notable improvements in IoT application performance, reducing response time by 45%, energy consumption by 39%, and weighted cost by 37%, while maintaining minimal scheduling overhead. Additionally, ReinFog exhibits remarkable scalability, with a rise in DRL Workers from 1 to 30 causing only a 0.3-second increase in startup time and around 2 MB more RAM per Worker. The proposed MADCP for DRL component placement further accelerates the convergence rate of DRL techniques by up to 38%.

Multi-Stream Scheduling of Inference Pipelines on Edge Devices - a DRL Approach

TATA: Throughput-Aware TAsk Placement in Heterogeneous Stream Processing with Deep Reinforcement Learning

A Framework for Mapping DRL Algorithms with Prioritized Replay Buffer onto Heterogeneous Platforms

RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs

Accelerating Exact Combinatorial Optimization via RL-based Initialization -- A Case Study in Scheduling

Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs

TF-DDRL: A Transformer-enhanced Distributed DRL Technique for Scheduling IoT Applications in Edge and Cloud Computing Environments

SGPRS: Seamless GPU Partitioning Real-Time Scheduler for Periodic Deep Learning Workloads

Deep Reinforcement Learning-based scheduling for optimizing system load and response time in edge and fog computing environments

A Deep Reinforcement Learning Approach to Multi-Component Job Scheduling in Edge Computing

A Co-Scheduling Framework for DNN Models on Mobile and Edge Devices with Heterogeneous Hardware

BCEdge: SLO-Aware DNN Inference Services With Adaptive Batch-Concurrent Scheduling on Edge Devices

Optimal Flow Admission Control in Edge Computing via Safe Reinforcement Learning

Online Scheduling of Coflows by Attention-Empowered Scalable Deep Reinforcement Learning.

RT-mDL

Adaptive Stream Processing on Edge Devices through Active Inference

RT-mDL: Supporting Real-Time Mixed Deep Learning Tasks on Edge Platforms

ReinFog: A DRL Empowered Framework for Resource Management in Edge and Cloud Computing Environments

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning