Abstract:Deep neural network (DNN) model compression is a popular and important optimization method for efficient and fast hardware acceleration. However, the compressed model is usually fixed, without the capability to tune the computing complexity (i.e., latency in hardware) on-the-fly, depending on dynamic latency requirements, workloads, and computing hardware resource allocation. To address this challenge, dynamic DNN with run-time adaption of computing structures has been constructed through training with a cross-entropy objective function consisting of multiple subnets sampled from the supernet. Our investigations in this work show that the performance of dynamic inference highly relies on the quality of subnet sampling. To construct a dynamic DNN with multiple high-quality subnets, we propose a progressive subnetwork searching framework, which is embedded with several proposed new techniques, including trainable noise ranking, channel-group sampling, selective fine-tuning, and subnet filtering. Our proposed framework empowers the target dynamic DNN with higher accuracy for all the subnets compared with prior works on both the Canadian Institute for Advanced Research dataset with 10 classes (CIFAR-10) and ImageNet datasets. Specifically, compared with United States-Neural Network (US-NN), our method achieves 0.9% average accuracy gain for Alexnet, 2.5% for ResNet18, 1.1% for Visual Geometry Group (VGG)11, and 0.58% for MobileNetv1, on the ImageNet dataset, respectively. Moreover, to demonstrate run-time tuning of computing latency of dynamic DNN in real computing system, we have deployed our constructed dynamic networks into Nvidia Titan graphics processing unit (GPU) and Intel Xeon central processing unit (CPU), showing great improvement over prior works. The code is available at https://github.com/ASU-ESIC-FAN-Lab/Dynamic-inference.

ProgressiveSpinalNet architecture for FC layers

Deep Neural Network Acceleration with Sparse Prediction Layers

Efficient Structure Slimming for Spiking Neural Networks

Deep CovDenseSNN: A Hierarchical Event-Driven Dynamic Framework with Spiking Neurons in Noisy Environment

Biologically Inspired Structure Learning with Reverse Knowledge Distillation for Spiking Neural Networks

IM-LIF: Improved Neuronal Dynamics with Attention Mechanism for Direct Training Deep Spiking Neural Network

An Efficient Learning Algorithm for Direct Training Deep Spiking Neural Networks

Adaptive Multi-Level Firing for Direct Training Deep Spiking Neural Networks

An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture

Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks

Progressive Principle Component Analysis for Compressing Deep Convolutional Neural Networks

Progressive Feature Interaction Search for Deep Sparse Network.

A Progressive Subnetwork Searching Framework for Dynamic Inference

Neural Architecture Search using Progressive Evolution

Progressive Neural Networks for Image Classification

A Progressive Training Framework for Spiking Neural Networks with Learnable Multi-hierarchical Model

Make Deep Networks Shallow Again

Advancing Spiking Neural Networks Toward Deep Residual Learning

EvoPruneDeepTL: An Evolutionary Pruning Model for Transfer Learning based Deep Neural Networks

Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

Input Fast-Forwarding for Better Deep Learning