Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations

Steffen Schotthöfer,Emanuele Zangrando,Jonas Kusch,Gianluca Ceruti,Francesco Tudisco
DOI: https://doi.org/10.48550/arXiv.2205.13571
2022-10-18
Abstract:Neural networks have achieved tremendous success in a large variety of applications. However, their memory footprint and computational demand can render them impractical in application settings with limited hardware or energy resources. In this work, we propose a novel algorithm to find efficient low-rank subnetworks. Remarkably, these subnetworks are determined and adapted already during the training phase and the overall time and memory resources required by both training and evaluating them are significantly reduced. The main idea is to restrict the weight matrices to a low-rank manifold and to update the low-rank factors rather than the full matrix during training. To derive training updates that are restricted to the prescribed manifold, we employ techniques from dynamic model order reduction for matrix differential equations. This allows us to provide approximation, stability, and descent guarantees. Moreover, our method automatically and dynamically adapts the ranks during training to achieve the desired approximation accuracy. The efficiency of the proposed method is demonstrated through a variety of numerical experiments on fully-connected and convolutional networks.
Machine Learning,Artificial Intelligence,Numerical Analysis
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper "LOW - RANK LOTTERY TICKETS: FINDING EFFICIENT LOW - RANK NEURAL NETWORKS VIA MATRIX DIFFERENTIAL EQUATIONS" aims to solve the problem of high computational and memory requirements of neural networks during the training and evaluation stages. Specifically, the paper proposes a new algorithm for finding efficient low - rank subnetworks during the training process. These subnetworks not only significantly reduce the time and memory resources required during training and evaluation, but also can automatically adapt to changes in rank while maintaining high accuracy. ### Main contributions 1. **Discovery and adaptation of low - rank subnetworks**: - Proposed a method for dynamically determining and adjusting low - rank subnetworks during the training process. - This is achieved by restricting the weight matrix on the low - rank manifold and updating the low - rank factors during the training process instead of the full matrix. 2. **Training method based on matrix differential equations**: - Utilize dynamic model order reduction techniques to derive training updates restricted to predefined manifolds. - Provide approximation, stability, and descent guarantees. 3. **Adaptive rank adjustment**: - The method can automatically and dynamically adjust the rank during the training process to achieve the required approximation accuracy. 4. **Experimental verification**: - Verified the effectiveness of this method on fully - connected networks and convolutional networks through a variety of numerical experiments, demonstrating the advantages of low - rank subnetworks in reducing memory storage and computational costs while maintaining accuracy comparable to that of full - rank networks. ### Background and motivation Modern neural networks have achieved great success in many applications, but their high computational and memory requirements make them impractical in application scenarios with limited hardware or energy resources. The paper aims to solve this problem by proposing an efficient low - rank training method. This method not only significantly reduces resource requirements during the training and evaluation stages, but also can automatically find high - performance low - rank subnetworks during the training process. ### Technical details 1. **Optimization on the low - rank manifold**: - Restrict the weight matrix on the low - rank manifold and update the low - rank factors during the training process. - By interpreting the training problem as a continuous - time gradient flow, use a low - rank numerical integrator to obtain the modified forward and backward training steps. 2. **Dynamic low - rank approximation (DLRA)**: - Utilize dynamic low - rank approximation techniques to approximate the solution through low - rank decomposition and derive the evolution equations of each factor. - Use a stable numerical integrator to ensure the stability of the method for small singular values. 3. **Adaptive rank adjustment**: - Select an appropriate approximate rank through continuous - time training dynamics, thereby directly finding high - performance low - rank subnetworks during the training process. ### Experimental results The paper demonstrates the effectiveness of the low - rank training method through experiments on the MNIST dataset. The experimental results show that for a sufficiently small rank, the low - rank training method is faster than the full - rank baseline method in both training and prediction times while maintaining high accuracy. In addition, the adaptive rank adjustment algorithm can significantly reduce the rank at the early stage of training and maintain a lower rank in subsequent training, thereby further improving efficiency. ### Conclusion The paper proposes an efficient low - rank training method that can automatically find high - performance low - rank subnetworks during the training process, significantly reducing the computational and memory requirements of neural networks while maintaining high accuracy. This method provides new possibilities for deploying neural networks on resource - limited devices.