Abstract:Deep neural networks (DNNs) thrive in recent years, wherein batch normalization (BN) plays an indispensable role. However, it has been observed that BN is costly due to the huge reduction and elementwise operations that are hard to be executed in parallel, which heavily reduces the training speed. To address this issue, in this article, we propose a methodology to alleviate the BN's cost by using only a few sampled or generated data for mean and variance estimation at each iteration. The key challenge to reach this goal is how to achieve a satisfactory balance between normalization effectiveness and execution efficiency. We identify that the effectiveness expects less data correlation in sampling while the efficiency expects more regular execution patterns. To this end, we design two categories of approach: sampling or creating a few uncorrelated data for statistics' estimation with certain strategy constraints. The former includes "batch sampling (BS)" that randomly selects a few samples from each batch and "feature sampling (FS)" that randomly selects a small patch from each feature map of all samples, and the latter is "virtual data set normalization (VDN)" that generates a few synthetic random samples to directly create uncorrelated data for statistics' estimation. Accordingly, multiway strategies are designed to reduce the data correlation for accurate estimation and optimize the execution pattern for running acceleration in the meantime. The proposed methods are comprehensively evaluated on various DNN models, where the loss of model accuracy and the convergence rate are negligible. Without the support of any specialized libraries, 1.98× BN layer acceleration and 23.2% overall training speedup can be practically achieved on modern GPUs. Furthermore, our methods demonstrate powerful performance when solving the well-known "micro-BN" problem in the case of a tiny batch size. This article provides a promising solution for the efficient training of high-performance DNNs.

Memory-Efficient Batch Normalization By One-Pass Computation for On-Device Training

ACBN: Approximate Calculated Batch Normalization for Efficient DNN On-Device Training Processor

L1 -Norm Batch Normalization for Efficient Training of Deep Neural Networks

DaDianNao: A Machine-Learning Supercomputer

Restructuring Batch Normalization to Accelerate CNN Training

Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation

Batch Normalization Sampling.

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Batchless Normalization: How to Normalize Activations Across Instances with Minimal Memory Requirements

Habituation Normalization: A Novel Way to Improve Network Training on Resource-Constrained Devices

"BNN - BN = ?": Training Binary Neural Networks Without Batch Normalization

Examining the Role and Limits of Batchnorm Optimization to Mitigate Diverse Hardware-noise in In-memory Computing

Improving Network Training on Resource-Constrained Devices via Habituation Normalization

Batch Normalization-Free Weight-Binarized SNN Based on Hardware-Saving IF Neuron.

“BNN - BN = ?”: Training Binary Neural Networks Without Batch Normalization

MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization

Optimizing data-flow in Binary Neural Networks

Batch Group Normalization

Pinpointing the Memory Behaviors of DNN Training

First Realization of Batch Normalization in Flash-Based Binary Neural Networks Using a Single Voltage Shifter

An Efficient Channel-Aware Sparse Binarized Neural Networks Inference Accelerator