DyCE: Dynamically Configurable Exiting for Deep Learning Compression and Real-time Scaling

Qingyuan Wang,Barry Cardiff,Antoine Frappé,Benoit Larras,Deepu John
2024-08-17
Abstract:Conventional deep learning (DL) model compression and scaling methods focus on altering the model's components, impacting the results across all samples uniformly. However, since samples vary in difficulty, a dynamic model that adapts computation based on sample complexity offers a novel perspective for compression and scaling. Despite this potential, existing dynamic models are typically monolithic and model-specific, limiting their generalizability as broad compression and scaling methods. Additionally, most deployed DL systems are fixed, unable to adjust their scale once deployed and, therefore, cannot adapt to the varying real-time demands. This paper introduces DyCE, a dynamically configurable system that can adjust the performance-complexity trade-off of a DL model at runtime without requiring re-initialization or redeployment on inference hardware. DyCE achieves this by adding small exit networks to intermediate layers of the original model, allowing computation to terminate early if acceptable results are obtained. DyCE also decouples the design of an efficient dynamic model, facilitating easy adaptation to new base models and potential general use in compression and scaling. We also propose methods for generating optimized configurations and determining the types and positions of exit networks to achieve desired performance and complexity trade-offs. By enabling simple configuration switching, DyCE provides fine-grained performance tuning in real-time. We demonstrate the effectiveness of DyCE through image classification tasks using deep convolutional neural networks (CNNs). DyCE significantly reduces computational complexity by 23.5% for ResNet152 and 25.9% for ConvNextv2-tiny on ImageNet, with accuracy reductions of less than 0.5%.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two key problems in deep learning (DL) models regarding compression and real - time expansion: 1. **Limitations of Static Models**: - Existing DL model compression and expansion methods mainly focus on changing the components of the model, which will affect the performance of all samples. However, different samples have different difficulties, and static models cannot dynamically adjust computational resources according to sample complexity. - DL systems after deployment are usually fixed and cannot adjust their scale according to real - time requirements, so they cannot adapt to changing computational demands. 2. **Limitations of Existing Dynamic Models**: - Existing dynamic models are usually designed as a single entity and specific to a certain model, making it difficult to be widely applied to different models, which limits their potential as a general - purpose compression and expansion method. - Most dynamic models are tightly coupled in design, that is, the backbone model, exit branches, and exit strategies are interdependent, making it difficult to be flexibly applied to other or future models. To solve these problems, the authors propose the **DyCE (Dynamically Configurable Exiting)** framework. The main objectives of DyCE are: - **Dynamically Adjust the Trade - off between Performance and Complexity**: By dynamically configuring exit points during the inference process, DyCE can adjust the performance and complexity of the model at runtime without re - initializing or redeploying the model. - **Improve Resource Utilization Efficiency**: By adding small exit networks at intermediate layers, it allows for early termination of computation when acceptable results are obtained, thereby reducing the waste of computational resources. - **Flexibility and Universality**: DyCE can easily adapt to new base models and can make fine - grained adjustments between different performance and complexity goals. Specifically, DyCE achieves these goals in the following ways: - **Split the Backbone Network**: Divide a pre - trained deep neural network into multiple segments and attach one or more exit networks after each segment. - **Dynamic Exit Controller**: Use a configurable exit controller to decide whether to terminate the computation at a certain exit point. The controller can dynamically adjust the exit strategy according to a predefined configuration file. - **Optimized Configuration Generation**: Propose two search algorithms to generate configuration files suitable for DyCE. These configuration files include choosing which exit functions and setting corresponding thresholds to achieve the desired performance - complexity trade - off. Through these methods, experiments on the ImageNet dataset show that DyCE can significantly reduce the computational complexity of ResNet152 and ConvNextv2 - tiny (by 23.5% and 25.9% respectively), while keeping the accuracy drop within no more than 0.5%.