Abstract:Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive resources to train. In this work, we present a novel approach for compressing overparameterized models, developed through studying their learning dynamics. We observe that for many deep models, updates to the weight matrices occur within a low-dimensional invariant subspace. For deep linear models, we demonstrate that their principal components are fitted incrementally within a small subspace, and use these insights to propose a compression algorithm for deep linear networks that involve decreasing the width of their intermediate layers. We empirically evaluate the effectiveness of our compression technique on matrix recovery problems. Remarkably, by using an initialization that exploits the structure of the problem, we observe that our compressed network converges faster than the original network, consistently yielding smaller recovery errors. We substantiate this observation by developing a theory focused on deep matrix factorization. Finally, we empirically demonstrate how our compressed model has the potential to improve the utility of deep nonlinear models. Overall, our algorithm improves the training efficiency by more than 2x, without compromising generalization.

What problem does this paper attempt to address?

The paper primarily aims to address the issues of high computational cost and excessive memory consumption during the training of over-parameterized deep models. Specifically, while over-parameterized models exhibit strong capabilities in solving various machine learning tasks and can achieve better generalization performance, they also significantly increase the demand for computational resources. This limits the application of these models to large-scale problems, especially in scenarios with limited computational resources. To tackle this problem, the authors propose a novel compression method that reduces the computational complexity of over-parameterized models by studying their learning dynamics. They observe that in many deep models, the updates of the weight matrices occur within low-dimensional invariant subspaces. Based on this observation, the authors design a compression algorithm for Deep Linear Networks (DLNs) that achieves model compression by reducing the width of intermediate layers. Experiments demonstrate that, with proper initialization, the compressed DLN can achieve lower recovery error than the original over-parameterized model during all iterations of gradient descent and converge faster. Furthermore, the authors show how these findings can be applied to accelerate the training of deep nonlinear models. By over-parameterizing the penultimate layer of these models and applying the proposed compression technique, memory complexity and training time can be reduced while maintaining or even improving test accuracy. In summary, the compression method proposed in this paper can significantly enhance training efficiency without sacrificing generalization performance, particularly in the context of low-rank matrix recovery problems.

Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

On Compressing Deep Models by Low Rank and Sparse Decomposition.

Compressing Deep Neural Networks With Sparse Matrix Factorization

Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning

Convolutional Neural Network Compression Based on Low-Rank Decomposition

Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training

A Model Compression Method Using Significant Data and Knowledge Distillation

Hyper-Compression: Model Compression via Hyperfunction

Deep Learning Model Compression with Rank Reduction in Tensor Decomposition.

Efficient Network Compression Through Smooth-Lasso Constraint

LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

CMD: Controllable Matrix Decomposition with Global Optimization for Deep Neural Network Compression

Compressing deep neural networks by matrix product operators

Iterative Deep Model Compression and Acceleration in the Frequency Domain.

On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network

Deep Learning Model Compression Techniques: Advances, Opportunities, and Perspective

Model Compression for Deep Neural Networks: A Survey

TDLC: Tensor decomposition‐based direct learning‐compression algorithm for DNN model compression

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach