Accelerating ViT Inference on FPGA through Static and Dynamic Pruning

Dhruv Parikh,Shouyi Li,Bingyi Zhang,Rajgopal Kannan,Carl Busart,Viktor Prasanna
2024-04-12
Abstract:Vision Transformers (ViTs) have achieved state-of-the-art accuracy on various computer vision tasks. However, their high computational complexity prevents them from being applied to many real-world applications. Weight and token pruning are two well-known methods for reducing complexity: weight pruning reduces the model size and associated computational demands, while token pruning further dynamically reduces the computation based on the input. Combining these two techniques should significantly reduce computation complexity and model size; however, naively integrating them results in irregular computation patterns, leading to significant accuracy drops and difficulties in hardware acceleration.
Distributed, Parallel, and Cluster Computing,Hardware Architecture,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?