Residual Alignment: Uncovering the Mechanisms of Residual Networks

Jianing Li,Vardan Papyan
2024-01-17
Abstract:The ResNet architecture has been widely adopted in deep learning due to its significant boost to performance through the use of simple skip connections, yet the underlying mechanisms leading to its success remain largely unknown. In this paper, we conduct a thorough empirical study of the ResNet architecture in classification tasks by linearizing its constituent residual blocks using Residual Jacobians and measuring their singular value decompositions. Our measurements reveal a process called Residual Alignment (RA) characterized by four properties:
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand the mechanism behind the success of Residual Network (ResNet). Although ResNet significantly improves performance through simple skip connections, the fundamental reason for its success has not yet been widely accepted in theoretical explanations. Specifically, the author hopes to reveal its internal mechanism through in - depth research on the ResNet architecture and its components, and explain why ResNet can perform excellently in various tasks. ### Main Problem Statements 1. **The Success Mechanism of ResNet**: Although ResNet has quickly become a mainstream architecture in the field of deep learning since its proposal, the specific mechanism behind its success remains unclear. The author hopes to find the key factors contributing to its performance improvement by studying the internal structure and behavior of ResNet. 2. **The Role of Skip Connections**: Skip connection is one of the core designs of ResNet, but its impact on model performance is not fully understood. The author hopes to verify through experiments whether skip connections are the key to ResNet's success and explore their specific roles. 3. **The Geometric Structure of Intermediate Representations**: Does there exist a certain geometric structure in the intermediate - layer representations of ResNet? Does this structure contribute to the generalization ability of the model? The author hopes to reveal the geometric characteristics of intermediate representations by analyzing Residual Jacobians of ResNet. 4. **The Relationship with Neural Collapse**: Does the Neural Collapse phenomenon occur simultaneously during the training process of ResNet? If so, is there an intrinsic connection between these two phenomena? The author hopes to verify the relationship between the two through experiments. ### Overview of Research Methods To answer the above questions, the author adopts the following methods: - **Linearizing Residual Blocks**: By linearizing the residual blocks of ResNet and using Residual Jacobians to measure their Singular Value Decomposition (SVD), the geometric structure of intermediate representations is analyzed. - **Empirical Research**: Train ResNet models on multiple benchmark datasets (such as MNIST, CIFAR10, etc.), and observe their performance under different depths, widths, and numbers of classes to verify the consistent characteristics of ResNet. - **Counterfactual Experiments**: By removing skip connections or changing other hyper - parameters, observe the impact of these changes on ResNet performance to further verify the importance of skip connections. - **Mathematical Model**: Propose an Unconstrained Jacobians Model to theoretically prove the occurrence conditions of Residual Alignment (RA) phenomenon. ### Main Phenomena Discovered The author discovered a phenomenon called "Residual Alignment" (RA), which has the following four characteristics: 1. **(RA1)**: Given the input, the intermediate representations are evenly distributed on a straight line in high - dimensional space. 2. **(RA2)**: The first few left and right singular vectors of the Residual Jacobian matrix are aligned with each other at different depths. 3. **(RA3)**: For fully - connected ResNet, the rank of the Residual Jacobian matrix is at most the number of classes \( C \). 4. **(RA4)**: The largest singular value of the Residual Jacobian matrix has a reciprocal relationship as the depth increases. These characteristics together reveal the high - order and geometric structure of ResNet's internal representations, which may be one of the important reasons for its success. ### Conclusions Through detailed empirical research and theoretical analysis, the author reveals the mechanism behind ResNet's success, especially the key role of skip connections in it. In addition, the author also discovered the Residual Alignment (RA) phenomenon and verified its consistency and wide applicability through multiple experiments. These findings not only enhance the understanding of ResNet but also provide new perspectives and directions for future research.