Abstract:Conventional deep learning (DL) model compression and scaling methods focus on altering the model's components, impacting the results across all samples uniformly. However, since samples vary in difficulty, a dynamic model that adapts computation based on sample complexity offers a novel perspective for compression and scaling. Despite this potential, existing dynamic models are typically monolithic and model-specific, limiting their generalizability as broad compression and scaling methods. Additionally, most deployed DL systems are fixed, unable to adjust their scale once deployed and, therefore, cannot adapt to the varying real-time demands. This paper introduces DyCE, a dynamically configurable system that can adjust the performance-complexity trade-off of a DL model at runtime without requiring re-initialization or redeployment on inference hardware. DyCE achieves this by adding small exit networks to intermediate layers of the original model, allowing computation to terminate early if acceptable results are obtained. DyCE also decouples the design of an efficient dynamic model, facilitating easy adaptation to new base models and potential general use in compression and scaling. We also propose methods for generating optimized configurations and determining the types and positions of exit networks to achieve desired performance and complexity trade-offs. By enabling simple configuration switching, DyCE provides fine-grained performance tuning in real-time. We demonstrate the effectiveness of DyCE through image classification tasks using deep convolutional neural networks (CNNs). DyCE significantly reduces computational complexity by 23.5% for ResNet152 and 25.9% for ConvNextv2-tiny on ImageNet, with accuracy reductions of less than 0.5%.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two key problems in deep learning (DL) models regarding compression and real - time expansion: 1. **Limitations of Static Models**: - Existing DL model compression and expansion methods mainly focus on changing the components of the model, which will affect the performance of all samples. However, different samples have different difficulties, and static models cannot dynamically adjust computational resources according to sample complexity. - DL systems after deployment are usually fixed and cannot adjust their scale according to real - time requirements, so they cannot adapt to changing computational demands. 2. **Limitations of Existing Dynamic Models**: - Existing dynamic models are usually designed as a single entity and specific to a certain model, making it difficult to be widely applied to different models, which limits their potential as a general - purpose compression and expansion method. - Most dynamic models are tightly coupled in design, that is, the backbone model, exit branches, and exit strategies are interdependent, making it difficult to be flexibly applied to other or future models. To solve these problems, the authors propose the **DyCE (Dynamically Configurable Exiting)** framework. The main objectives of DyCE are: - **Dynamically Adjust the Trade - off between Performance and Complexity**: By dynamically configuring exit points during the inference process, DyCE can adjust the performance and complexity of the model at runtime without re - initializing or redeploying the model. - **Improve Resource Utilization Efficiency**: By adding small exit networks at intermediate layers, it allows for early termination of computation when acceptable results are obtained, thereby reducing the waste of computational resources. - **Flexibility and Universality**: DyCE can easily adapt to new base models and can make fine - grained adjustments between different performance and complexity goals. Specifically, DyCE achieves these goals in the following ways: - **Split the Backbone Network**: Divide a pre - trained deep neural network into multiple segments and attach one or more exit networks after each segment. - **Dynamic Exit Controller**: Use a configurable exit controller to decide whether to terminate the computation at a certain exit point. The controller can dynamically adjust the exit strategy according to a predefined configuration file. - **Optimized Configuration Generation**: Propose two search algorithms to generate configuration files suitable for DyCE. These configuration files include choosing which exit functions and setting corresponding thresholds to achieve the desired performance - complexity trade - off. Through these methods, experiments on the ImageNet dataset show that DyCE can significantly reduce the computational complexity of ResNet152 and ConvNextv2 - tiny (by 23.5% and 25.9% respectively), while keeping the accuracy drop within no more than 0.5%.

DyCE: Dynamically Configurable Exiting for Deep Learning Compression and Real-time Scaling

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge.

DiCENet: Dimension-wise Convolutions for Efficient Networks

Dynamic and Adaptive Threshold for DNN Compression from Scratch.

ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model

DynExit: A Dynamic Early-Exit Strategy for Deep Residual Networks

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers

AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices

Tiny Models are the Computational Saver for Large Models

Dynamic Semantic Compression for CNN Inference in Multi-access Edge Computing: A Graph Reinforcement Learning-based Autoencoder

Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms

Computational Efficient Width-Wise Early Exiting in Wireless Communication Systems

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

LDCNet: A Lightweight Multi-Scale Convolutional Neural Network Using Local Dense Connectivity for Image Recognition

Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

CDC: Classification Driven Compression for Bandwidth Efficient Edge-Cloud Collaborative Deep Learning

Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression

Cavs: An Efficient Runtime System For Dynamic Neural Networks

Deep Learning Model Compression Techniques: Advances, Opportunities, and Perspective

Deep learning model compression using network sensitivity and gradients