Abstract:The increasing computational demands of Deep Reinforcement Learning (DRL) models, particularly for embedded systems in autonomous vehicles and drones, present significant challenges owing to their extensive neural network complexities. Previous DRL compression strategies predominantly focused on unstructured pruning, effective for reducing model size but requiring specialized hardware for computational acceleration. Conversely, DRL models with structured pruning applied can be accelerated on standard hardware, though they typically encounter performance issues at higher pruning rates due to structural constraints. In response to these challenges, this paper introduces an advanced structured pruning methodology, combined with scaled policy constraints (SPC) for DRL models. Our approach overcomes the performance limitations of conventional structured pruning, achieving high pruning rates while maintaining robust model performance. Enhanced performance restoration after pruning is achieved by fine-tuning with SPC and applying structural regularization, thus ensuring efficient decision-making with a minimal computational burden. Extensive evaluations on the D4RL benchmark and in a drone control simulation environment confirm the effectiveness of our method. Our approach maintains performance integrity even at high pruning rates, with less than a 2% decrease in normalized score at 90% pruning in D4RL and preserving cumulative reward at 87% pruning in drone control simulation. Significantly, our approach also enables considerable computational acceleration on standard hardware. We implemented our method on the NVIDIA Jetson Xavier NX board and achieved a 2.5-fold speed-up on devices with NVIDIA Volta GPUs and over double the speed-up on those with NVIDIA Carmel ARMv8.2 CPUs. These outcomes highlight our method's suitability for real-time, resource-constrained applications, demonstrating its practicality and efficiency.

LRP-based Policy Pruning and Distillation of Reinforcement Learning Agents for Embedded Systems.

LRP-based Network Pruning and Policy Distillation of Robust and Non-Robust DRL Agents for Embedded Systems

Real-time Policy Distillation in Deep Reinforcement Learning

Pruning With Scaled Policy Constraints for Light-Weight Reinforcement Learning

Policy Compression for Intelligent Continuous Control on Low-Power Edge Devices

Double Sparse Deep Reinforcement Learning Via Multilayer Sparse Coding and Nonconvex Regularized Pruning

RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving

Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization

Effective Interpretable Policy Distillation via Critical Experience Point Identification

RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

Leveraging Knowledge Distillation for Efficient Deep Reinforcement Learning in Resource-Constrained Environments

Distilling Deep RL Models Into Interpretable Neuro-Fuzzy Systems

Neural-to-Tree Policy Distillation with Policy Improvement Criterion

Efficient Multi-agent Navigation with Lightweight DRL Policy

Eliminating Primacy Bias in Online Reinforcement Learning by Self-Distillation

Learning Navigation Policies for Mobile Robots in Deep Reinforcement Learning with Random Network Distillation

The Impact of Quantization and Pruning on Deep Reinforcement Learning Models

Adversary Agnostic Robust Deep Reinforcement Learning

Distillation Strategies for Proximal Policy Optimization

Deep Reinforcement Learning Using Least‐squares Truncated Temporal‐difference