Abstract:Recently, deep learning has made remarkable strides, especially with generative modeling, such as large language models and probabilistic diffusion models. However, training these models often involves significant computational resources, requiring billions of petaFLOPs. This high resource consumption results in substantial energy usage and a large carbon footprint, raising critical environmental concerns. Back-propagation (BP) is a major source of computational expense during training deep learning models. To advance research on energy-efficient training and allow for sparse learning on any machine and device, we propose a general, energy-efficient convolution module that can be seamlessly integrated into any deep learning architecture. Specifically, we introduce channel-wise sparsity with additional gradient selection schedulers during backward based on the assumption that BP is often dense and inefficient, which can lead to over-fitting and high computational consumption. Our experiments demonstrate that our approach reduces 40\% computations while potentially improving model performance, validated on image classification and generation tasks. This reduction can lead to significant energy savings and a lower carbon footprint during the research and development phases of large-scale AI systems. Additionally, our method mitigates over-fitting in a manner distinct from Dropout, allowing it to be combined with Dropout to further enhance model performance and reduce computational resource usage. Extensive experiments validate that our method generalizes to a variety of datasets and tasks and is compatible with a wide range of deep learning architectures and modules. Code is publicly available at <a class="link-external link-https" href="https://github.com/lujiazho/ssProp" rel="external noopener nofollow">this https URL</a>.

Training Simplification and Model Simplification for Deep Learning : A Minimal Effort Back Propagation Method

Meprop: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Minimal Effort Back Propagation for Convolutional Neural Networks

Efficient Neural Network Training Via Forward and Backward Propagation Sparsification

Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks

A Model Compression Method Using Significant Data and Knowledge Distillation

Memorized Sparse Backpropagation.

Compact Model Training by Low-Rank Projection with Energy Transfer

MinBackProp -- Backpropagating through Minimal Solvers

Convolutional Neural Network Simplification with Progressive Retraining

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks

The Lower The Simpler: Simplifying Hierarchical Recurrent Models

Learning Efficient Convolutional Networks Through Network Slimming.

Neural Network Pruning by Gradient Descent

Accelerating CNN Training by Pruning Activation Gradients

TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML On-device Learning

Students and teachers learning together: a robust training strategy for neural network pruning

ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation

Scaling Laws Beyond Backpropagation

A Novel Deep Learning Model Compression Algorithm