Vision Transformer Pruning Via Matrix Decomposition

Tianyi Sun
2023-08-22
Abstract:This is a further development of Vision Transformer Pruning via matrix decomposition. The purpose of the Vision Transformer Pruning is to prune the dimension of the linear projection of the dataset by learning their associated importance score in order to reduce the storage, run-time memory, and computational demands. In this paper we further reduce dimension and complexity of the linear projection by implementing and comparing several matrix decomposition methods while preserving the generated important features. We end up selected the Singular Value Decomposition as the method to achieve our goal by comparing the original accuracy scores in the original Github repository and the accuracy scores of using those matrix decomposition methods, including Singular Value Decomposition, four versions of QR Decomposition, and LU factorization.
Computer Vision and Pattern Recognition,Computation
What problem does this paper attempt to address?
The problem this paper attempts to address is to further reduce the storage, runtime memory, and computational resource requirements of Vision Transformers through matrix decomposition while preserving the essential features generated. Specifically, the paper aims to further reduce the dimensionality and complexity of linear projections by implementing and comparing several matrix decomposition methods, including Singular Value Decomposition (SVD), four versions of QR decomposition, and LU decomposition. Ultimately, the paper selects Singular Value Decomposition as the method to achieve its goals and validates its effectiveness by comparing the accuracy scores from the original GitHub repository with the accuracy scores of other matrix decomposition methods. The main contribution of the paper is the proposal of a Vision Transformer pruning method based on matrix decomposition, which can effectively reduce the storage and computational overhead of the model without significantly sacrificing its performance. This makes Vision Transformers more efficient and feasible for practical applications.