Abstract:In deep learning, neural networks serve as noisy channels between input data and its latent representa-tion. This perspective naturally relates deep learning with the pursuit of constructing channels with optimal performance in information transmission and representation. While considerable efforts are concentrated on realizing optimal channel properties during network optimization, we study a frequently overlooked possibility that neural networks can be initialized toward optimal channels. Our theory, consistent with experimental validation, identifies primary mechanics underlying this unknown possibility and suggests intrinsic connec-tions between statistical physics and deep learning. Unlike the conventional theories that characterize neural networks applying the classic mean-field approximation, we offer analytic proof that this extensively applied simplification scheme is not appropriate in studying neural networks as information channels. To fill this gap, we develop a restricted mean-field framework applicable for characterizing the limiting behaviors of information propagation in neural networks without strong assumptions on inputs. Based on it, we propose an analytic theory to prove that mutual information maximization is realized between inputs and propagated signals when neural networks are initialized at dynamic isometry, a case where information transmits via norm-preserving mappings. These theoretical predictions are validated by experiments on real neural networks, suggesting the robustness of our theory against finite-size effects. Finally, we analyze our findings with information bottleneck theory to confirm the precise relations among dynamic isometry, mutual information maximization, and optimal channel properties in deep learning. Our work may lay a cornerstone for promoting deep learning in terms of network initialization and suggest general statistical physics mechanisms underlying diverse deep learning techniques.

Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization

Exploring Machine Learning Privacy/Utility trade-off from a hyperparameters Lens

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

No Free Prune: Information-Theoretic Barriers to Pruning at Initialization

Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers

Privacy for Free in the Over-Parameterized Regime

Alleviating Barren Plateaus in Parameterized Quantum Machine Learning Circuits: Investigating Advanced Parameter Initialization Strategies

Can sparsity improve the privacy of neural networks?

Statistical Physics of Deep Neural Networks: Initialization Toward Optimal Channels.

Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent

Deconstructing the Goldilocks Zone of Neural Network Initialization

A Type of Generalization Error Induced by Initialization in Deep Neural Networks.

On the Privacy of Noisy Stochastic Gradient Descent for Convex Optimization

An Improved Analysis of Training Over-parameterized Deep Neural Networks

Preserving Differential Privacy in Deep Neural Networks with Relevance-Based Adaptive Noise Imposition

On the Role of Initialization on the Implicit Bias in Deep Linear Networks

Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks

Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

Theoretical Analysis of Privacy Leakage in Trustworthy Federated Learning: A Perspective from Linear Algebra and Optimization Theory