Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

Xin Yuan,Pedro Savarese,Michael Maire
2023-06-22
Abstract:We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple replication heuristics or utilize auxiliary gradient-based local optimization, we craft a parameterization scheme which dynamically stabilizes weight, activation, and gradient scaling as the architecture evolves, and maintains the inference functionality of the network. To address the optimization difficulty resulting from imbalanced training effort distributed to subnetworks fading in at different growth phases, we propose a learning rate adaption mechanism that rebalances the gradient contribution of these separate subcomponents. Experimental results show that our method achieves comparable or better accuracy than training large fixed-size models, while saving a substantial portion of the original computation budget for training. We demonstrate that these gains translate into real wall-clock training speedups.
Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the high cost of training large-scale neural networks and proposes an efficient method for scaling neural networks. Specifically, the goals of the paper include: 1. **Improving Training Efficiency**: - By gradually increasing the network width, starting from a smaller model and progressively expanding to a larger model, thereby saving a significant amount of computational resources. - Compared to directly training a large fixed-size model, this method not only achieves or surpasses its accuracy but also significantly reduces computational costs. 2. **Maintaining Functional Continuity**: - Ensuring that newly added parameters do not disrupt the functionality of the original network during the expansion process. - Using parameterization schemes (such as variance shifting) to dynamically stabilize weights, activations, and gradient scaling, allowing the network to maintain its inference capabilities during expansion. 3. **Optimizing Learning Rate Scheduling**: - Addressing the training imbalance between sub-networks introduced at different growth stages by proposing a learning rate adaptation mechanism that rebalances the gradient contributions of each sub-component. - Solving the issue of different training durations for sub-networks at various growth stages through phased learning rate adjustments. 4. **Broad Applicability and Acceleration Effects**: - This method is not only applicable to image classification tasks but also to other tasks such as machine translation, and it performs well across various network architectures. - Experimental results show that this method can bring significant acceleration effects in actual training time. In summary, this paper addresses efficiency and optimization issues in large-scale model training through a novel network expansion framework, achieving faster training speeds and higher accuracy.