Abstract:Vision Transformers (ViTs) have recently made a splash in computer vision domain and achieved state-of-the-art in many vision tasks. Nevertheless, due to their vast model size and high computational costs, rare transformer-based models are adopted in real-world applications. Since the computational costs of attention operation is the square of the input size, some compression methods for the Multi-Head Self-Attention (MHSA) module have been proposed, reducing its FLOPs successfully but almost without parameters reduction. Meanwhile, the number of parameters and computational costs in the Feed-Forward Network (FFN) module exceeds the MHSA larger, while its compression technologies have not been delved deeper. Consequently, we focus our insight on the compression of FFN layer and present a pruning method named Multi-Dimension Compression of Feed-Forward Network in Vision Transformers(MCF), which greatly reduces the model’s parameters and computational costs. Firstly, we identify the critical elements in the output of the FFN module and then employ them to guide the irregular sparsity of this layer, recognizing insignificant elements of FFN layer that have less impact on the output. Successively, to discard the insignificant elements, we transform the irregular sparsity into regular sparsity and prune them, thus reducing the models’ parameters and getting a substantial speed-up during inference. Extensive results on ImageNet-1K validate the effectiveness of our proposed method, which obtains significant parameters and computational costs reduction with almost unimpaired generalization. For example, we compress DeiT-Tiny with 42% reduction in FLOPs and 33% reduction in parameters, almost without losing accuracy on the ImageNet dataset. Further, we verify the effectiveness of our method in the downstream task, using the pruned DeiT-Small as the backbone for the object detection task on the COCO dataset, gaining revenue without compromising its performance.

A Compression Pipeline for One-Stage Object Detection Model

Effective Pipeline for Compressing Deep Object Detectors

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

An Efficient Compressive Convolutional Network for Unified Object Detection and Image Compression

A survey of model compression strategies for object detection

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

Multi-Dimension Compression of Feed-Forward Network in Vision Transformers

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

Small Object Detection Based on Modified FSSD and Model Compression

Channel Pruning and Quantization-Based Learning for Object Detection with Computing Source Limited Application

Masked Feature Compression for Object Detection

Group channel pruning and spatial attention distilling for object detection

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Research on the Construction of an Efficient and Lightweight Online Detection Method for Tiny Surface Defects through Model Compression and Knowledge Distillation

COMPRESSIVE SENSING BASED CONVOLUTIONAL NEURAL NETWORK FOR OBJECT DETECTION

Feature Compression for Rate Constrained Object Detection on the Edge

Pruning at a Glance: Global Neural Pruning for Model Compression

Model Compression for Deep Neural Networks: A Survey

Downscaling and Overflow-aware Model Compression for Efficient Vision Processors

Developing a Compressed Object Detection Model based on YOLOv4 for Deployment on Embedded GPU Platform of Autonomous System

DPNet: Dual-Path Network for Real-Time Object Detection With Lightweight Attention