Abstract:The rapid development of deep neural networks (DNNs) in recent years can be attributed to the various techniques that address gradient explosion and vanishing. In order to understand the principle behind these techniques and develop new methods, plenty of metrics have been proposed to identify networks that are free of gradient explosion and vanishing. However, due to the diversity of network components and complex serial-parallel hybrid connections in modern DNNs, the evaluation of existing metrics usually requires strong assumptions, complex statistical analysis, or has limited application fields, which constraints their spread in the community. In this paper, inspired by the Gradient Norm Equality and dynamical isometry, we first propose a novel metric called Block Dynamical Isometry, which measures the change of gradient norm in individual blocks. Because our Block Dynamical Isometry is norm-based, its evaluation needs weaker assumptions compared with the original dynamical isometry. To mitigate challenging derivation, we propose a highly modularized statistical framework based on free probability. Our framework includes several key theorems to handle complex serial-parallel hybrid connections and a library to cover the diversity of network components. Besides, several sufficient conditions for prerequisites are provided. Powered by our metric and framework, we analyze extensive initialization, normalization, and network structures. We find that our Block Dynamical Isometry is a universal philosophy behind them. Then, we improve some existing methods based on our analysis, including an activation function selection strategy for initialization techniques, a new configuration for weight normalization, a depth-aware way to derive coefficients in SeLU, and initialization/weight normalization in DenseNet. Moreover, we propose a novel normalization technique named second moment normalization, which has 30 percent fewer computation overhead than batch normalization without accuracy loss and has better performance under micro batch size. Last but not least, our conclusions and methods are evidenced by extensive experiments on multiple models over CIFAR-10 and ImageNet.

<i>EvalDNN</i>: A Toolbox for Evaluating Deep Neural Network Models

TBD: Benchmarking and Analyzing Deep Neural Network Training

DVTest: Deep Neural Network Visualization Testing Framework

DLBench: a comprehensive experimental evaluation of deep learning frameworks

Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking

DLBench: An Experimental Evaluation of Deep Learning Frameworks

Machine Learning-enabled Performance Model for DNN Applications and AI Accelerator

Novel Deep Neural Network Classifier Characterization Metrics with Applications to Dataless Evaluation

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Evaluating Deep Neural Networks in Deployment (A Comparative and Replicability Study)

TorchBench: Benchmarking PyTorch with High API Surface Coverage

PyNeval: A Python Toolbox for Evaluating Neuron Reconstruction Performance

A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks.

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

Investigating Deep Learning Benchmarks for Electrocardiography Signal Processing

Benchmarking Resource Usage for Efficient Distributed Deep Learning

Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training

A methodological framework for optimizing the energy consumption of deep neural networks: a case study of a cyber threat detector