Abstract:Most state-of-the-art convolutional neural networks (CNNs) are characterised by excessive parameterisation, leading to a high computational burden. Tensor decomposition has emerged as a model reduction technique for compressing deep neural networks. Previous approaches have predominantly relied on either Tucker decomposition or Canonical Polyadic (CP) decomposition for CNNs. However, CP decomposition exhibits exceptional compression capabilities in comparison to Tucker decomposition, which results in a more pronounced accuracy loss. This paper introduces an efficient model compression method, termed TEC-CNN, designed to achieve significant compression while preserving accuracy levels comparable to those of the original models. In TEC-CNN, convolutional layers are identified to obtain convolutional kernels by analysing given models under the principles of low-rank tensor decomposition, and then, calculating the ranks of convolutional kernels. Furthermore, an efficient decomposition schema for the convolutional kernel is proposed with approximate kernel tensor for reducing parameters. Additionally, a novel format of a convolutional sequence is presented and constructed with a reduced number of parameters to replace the original convolutional layers. Finally, the effectiveness of TEC-CNN is assessed across a range of computer vision tasks. For instance, in CIFAR-100 classification, ResNet18 is compressed to 4.1 MB, while Unext, when applied to image segmentation using the International Skin Imaging Collaboration (ISIC) dataset, is reduced to 3.419 MB. When employed for fire object detection with Yolov7, TEC-CNN achieves a model size reduction of 71.6 MB. Comprehensive experimental results underscore that our approach achieves significant model compression while preserving model performance.

T1000: Mitigating the Memory Footprint of Convolution Neural Networks with Decomposition and Re-Fusion.

TEC-CNN: Towards Efficient Compressing Convolutional Neural Nets with Low-rank Tensor Decomposition

Sensitivity-Oriented Layer-Wise Acceleration and Compression for Convolutional Neural Network.

Layer-Wise Training To Create Efficient Convolutional Neural Networks

LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction

ReDistill: Residual Encoded Distillation for Peak Memory Reduction

An Efficient Kernel Transformation Architecture for Binary- and Ternary-Weight Neural Network Inference.

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs.

Splitting Convolutional Neural Network Structures for Efficient Inference

Sparse Kronecker Canonical Polyadic Decomposition for Convolutional Neural Networks Compression

Separable Binary Convolutional Neural Network on Embedded Systems

InceptionNeXt: When Inception Meets ConvNeXt

Improved JPEG Lossless Compression for Compression of Intermediate Layers in Neural Networks Based on Compute-In-Memory

Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network

Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platforms

Sparsing Deep Neural Network Using Semi-Discrete Matrix Decomposition

Hybrid Tensor Decomposition in Neural Network Compression

Accelerating Convolutional Neural Networks by Removing Interspatial and Interkernel Redundancies.

RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations

Layer-Wise Mixed-Modes CNN Processing Architecture With Double-Stationary Dataflow and Dimension-Reshape Strategy

Deep neural network compression by Tucker decomposition with nonlinear response