Abstract:Although convolutional neural network (CNN) models have greatly enhanced the development of many fields, the untenable number of parameters and computations in these models yield significant performance and energy challenges in hardware implementations. Transferred filter-based methods, as very promising techniques that have not yet been explored in the architecture domain, can substantially compress CNN models. However, their straightforward hardware implementation inherently incurs massive redundant computations, causing significant energy and time consumption. In this work, a highly efficient transferred filter-based engine (TFE) is developed to alleviate this deficiency, with CNN models compressed and accelerated. First, the filters of CNN models are flexibly transferred according to specific tasks to reduce the model size. Then, two hardware-friendly mechanisms are proposed in the TFE to remove duplicate computations caused by transferred filters, which can further accelerate transferred CNN models. The first mechanism exploits the shared weights hidden in each row of transferred filters and reuses the corresponding same partial sums, reducing at least 25% of repetitive computations in each row. The second mechanism can intelligently schedule and access the memory system to reuse the repetitive partial sums among different rows of the transferred filters with at least 25% of computations eliminated. Furthermore, an efficient hardware architecture is proposed in the TFE to fully reap the benefits of the two proposed mechanisms such that different types of networks are flexibly supported. To achieve high energy efficiency, the sub-array-based filter mapping method (SAFM) is proposed, where the process element (PE) subarray is used as the elementary computational unit to support various filters. Therein, input data can be efficiently broadcast in each PE sub-array and the load can be stripped from each PE and intensively alleviated, which can dramatically reduce the area and power consumption. Excluding MobileNet-like networks that adopt depth-wise convolution, most mainstream networks can be compressed and accelerated by the proposed TFE. Two state-of-the-art transferred filter-based methods, i.e., doubly CNN and symmetry CNN are implemented by exploiting the TFE. Compared with Eyeriss, average speedup improvements of 2.93× and 3.17× are achieved in the convolutional layers of various modern CNNs. The overall energy efficiency can be improved by 12.66× and 13.31× on average. Compared with other state-of-the-art related works, the TFE can maximally achieve a parameter reduction of 4.0×, a speedup of 2.72× and an energy efficiency improvement of 10.74× on VGGNet.

TECO: A Unified Feature Map Compression Framework Based on Transform and Entropy

STC: Significance-aware Transform-based Codec Framework for External Memory Access Reduction

TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks

ASC: Adaptive Scale Feature Map Compression for Deep Neural Network

2C-Net: integrate image compression and classification via deep neural network

Hybrid Tensor Decomposition in Neural Network Compression

Efficient feature transform module

Semi-tensor Product-based TensorDecomposition for Neural Network Compression

Toward Intelligent Sensing: Intermediate Deep Feature Compression

A Unified End-to-End Framework for Efficient Deep Image Compression

DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework

End-to-end optimized image compression with the frequency-oriented transform

CD 200-mediated regulation of skin immunity.

A Deep Image Compression Framework for Face Recognition

TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions

Deep neural network compression by Tucker decomposition with nonlinear response

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework

Towards Analysis-Friendly Face Representation with Scalable Feature and Texture Compression

Auto-Tiler: Variable-Dimension Autoencoder with Tiling for Compressing Intermediate Feature Space of Deep Neural Networks for Internet of Things