Abstract:In deep learning, neural networks serve as noisy channels between input data and its latent representa-tion. This perspective naturally relates deep learning with the pursuit of constructing channels with optimal performance in information transmission and representation. While considerable efforts are concentrated on realizing optimal channel properties during network optimization, we study a frequently overlooked possibility that neural networks can be initialized toward optimal channels. Our theory, consistent with experimental validation, identifies primary mechanics underlying this unknown possibility and suggests intrinsic connec-tions between statistical physics and deep learning. Unlike the conventional theories that characterize neural networks applying the classic mean-field approximation, we offer analytic proof that this extensively applied simplification scheme is not appropriate in studying neural networks as information channels. To fill this gap, we develop a restricted mean-field framework applicable for characterizing the limiting behaviors of information propagation in neural networks without strong assumptions on inputs. Based on it, we propose an analytic theory to prove that mutual information maximization is realized between inputs and propagated signals when neural networks are initialized at dynamic isometry, a case where information transmits via norm-preserving mappings. These theoretical predictions are validated by experiments on real neural networks, suggesting the robustness of our theory against finite-size effects. Finally, we analyze our findings with information bottleneck theory to confirm the precise relations among dynamic isometry, mutual information maximization, and optimal channel properties in deep learning. Our work may lay a cornerstone for promoting deep learning in terms of network initialization and suggest general statistical physics mechanisms underlying diverse deep learning techniques.

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

Explicitising The Implicit Intrepretability of Deep Neural Networks Via Duality

Modular Duality in Deep Learning

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

On the Expressive Power of Deep Neural Networks

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

On the Equivalence Between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint

Duality Principle and Biologically Plausible Learning: Connecting the Representer Theorem and Hebbian Learning

Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint

Statistical Physics of Deep Neural Networks: Initialization Toward Optimal Channels.

Mehler's Formula, Branching Process, and Compositional Kernels of Deep Neural Networks

Revealing the Structure of Deep Neural Networks via Convex Duality

On Symmetry and Initialization for Neural Networks

Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks

Deep Learning 2.0: Artificial Neurons That Matter -- Reject Correlation, Embrace Orthogonality

Black Boxes and Looking Glasses: Multilevel Symmetries, Reflection Planes, and Convex Optimization in Deep Networks

Low-dimensional Intrinsic Dimension Reveals a Phase Transition in Gradient-Based Learning of Deep Neural Networks

Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

On Privileged and Convergent Bases in Neural Network Representations

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

The Expressive Power of Neural Networks: A View from the Width