Abstract:Deep neural network (DNN) compression has become a hot topic in the research of deep learning since the scale of modern DNNs turns into too huge to implement on practical resource constrained platforms such as embedded devices. Among variant compression methods, tensor decomposition appears to be a relatively simple and efficient strategy owing to its solid mathematical foundations and regular data structure. Generally, tensorizing neural weights into higher-order tensors for better decomposition, and directly mapping efficient tensor structure to neural architecture with nonlinear activation functions, are the two most common ways. However, the considerable accuracy loss is still a fly in the ointment for the tensorizing way especially for convolutional neural networks (CNNs), while the number of studies in the mapping way is comparatively limited and corresponding compression ratio appears to be not considerable. Therefore, in this work, by researching multiple types of tensor decompositions, we realize that tensor train (TT), which has specific and efficient sequenced contractions, is potential to take into account both of tensorizing and mapping ways. Then we propose a novel nonlinear tensor train (NTT) format, which contains extra nonlinear activation functions embedded in sequenced contractions and convolutions on the top of the normal TT decomposition and the proposed TT format connected by convolutions, to compensate the accuracy loss that normal TT cannot give. Further than just shrinking the space complexity of original weight matrices and convolutional kernels, we prove that NTT can afford an efficient inference time as well. Extensive experiments and discussions demonstrate that the compressed DNNs in our NTT format can almost maintain the accuracy at least on MNIST, UCF11 and CIFAR-10 datasets, and the accuracy loss caused by normal TT could be compensated significantly on large-scale datasets such as ImageNet.

TDLC: Tensor decomposition‐based direct learning‐compression algorithm for DNN model compression

DCCD: Reducing Neural Network Redundancy Via Distillation

Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework

Deep Convolutional Neural Network Compression Method: Tensor Ring Decomposition with Variational Bayesian Approach

DNN Compression Approach Based on Bayesian Optimization Tensor Ring Decomposition

Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization

Hybrid Tensor Decomposition in Neural Network Compression

Compressing 3DCNNs based on tensor train decomposition

Deep neural network compression by Tucker decomposition with nonlinear response

Semi-tensor Product-based TensorDecomposition for Neural Network Compression

On Compressing Deep Models by Low Rank and Sparse Decomposition.

Holistic CNN Compression Via Low-Rank Decomposition with Knowledge Transfer.

Nonlinear Tensor Train Format for Deep Neural Network Compression

STN: Scalable Tensorizing Networks via Structure-Aware Training and Adaptive Compression

A Model Compression Method Using Significant Data and Knowledge Distillation

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

Dictionary Pair-based Data-Free Fast Deep Neural Network Compression

Convolutional Neural Network Compression Based on Low-Rank Decomposition

MPDCompress - Matrix Permutation Decomposition Algorithm for Deep Neural Network Compression

Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition

Tensorial Neural Networks: Generalization of Neural Networks and Application to Model Compression