Abstract:Recently, the sizes of deep neural networks and training datasets both increase drastically to pursue better performance in a practical sense. With the prevalence of transformer-based models in vision tasks, even more pressure is laid on the GPU platforms to train these heavy models, which consumes a large amount of time and computing resources as well. Therefore, it's crucial to accelerate the training process of deep neural networks. In this paper, we propose a general network expansion method to reduce the practical time cost of the model training process. Specifically, we utilize both width- and depth-level sparsity of dense models to accelerate the training of deep neural networks. Firstly, we pick a sparse sub-network from the original dense model by reducing the number of parameters as the starting point of training. Then the sparse architecture will gradually expand during the training procedure and finally grow into a dense one. We design different expanding strategies to grow CNNs and ViTs respectively, due to the great heterogeneity in between the two architectures. Our method can be easily integrated into popular deep learning frameworks, which saves considerable training time and hardware resources. Extensive experiments show that our acceleration method can significantly speed up the training process of modern vision models on general GPU devices with negligible performance drop (e.g. 1.42x faster for ResNet-101 and 1.34x faster for DeiT-base on ImageNet-1k). The code is available at https://github.com/huawei-noah/Efficient-Computing/tree/master/TrainingAcceleration/NetworkExpansion and https://gitee.com/mindspore/hub/blob/master/mshub_res/assets/noah-cvlab/gpu/1.8/networkexpansion_v1.0_imagenet2012.md.

A New Approach to Compute CNNs for Extremely Large Images

Enabling Efficient Fast Convolution Algorithms on GPUs Via MegaKernels

Accelerating Spatiotemporal Supervised Training of Large-Scale Spiking Neural Networks on GPU

Efficient Classification of Very Large Images with Tiny Objects

Accelerating convolutional neural network by exploiting sparsity on GPUs

Training Multiscale-CNN for Large Microscopy Image Classification in One Hour

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Effect of neural network structure in accelerating performance and accuracy of a convolutional neural network with GPU/TPU for image analytics

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

Layer-Wise Mixed-Modes CNN Processing Architecture With Double-Stationary Dataflow and Dimension-Reshape Strategy

CS-CNN: Enabling Robust and Efficient Convolutional Neural Networks Inference for Internet-of-Things Applications

CSPNet: A New Backbone that can Enhance Learning Capability of CNN

Network Expansion for Practical Training Acceleration

Fast and accurate variable batch size convolution neural network training on large scale distributed systems

cuSCNN: A Secure and Batch-Processing Framework for Privacy-Preserving Convolutional Neural Network Prediction on GPU

Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

Channel and filter parallelism for large-scale CNN training

An efficient approach to escalate the speed of training convolution neural networks

A Novel Memory-Scheduling Strategy for Large Convolutional Neural Network on Memory-Limited Devices

NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning