HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices

Federico Nicolas Peccia,Luciano Ferreyro,Alejandro Furfaro

2024-08-26

Abstract:During the last years, algorithms known as Convolutional Neural Networks (CNNs) had become increasingly popular, expanding its application range to several areas. In particular, the image processing field has experienced a remarkable advance thanks to this algorithms. In IoT, a wide research field aims to develop hardware capable of execute them at the lowest possible energy cost, but keeping acceptable image inference time. One can get around this apparently conflicting objectives by applying design and training techniques. The present work proposes a generic hardware architecture ready to be implemented on FPGA devices, supporting a wide range of configurations which allows the system to run different neural network architectures, dynamically exploiting the sparsity caused by pruning techniques in the mathematical operations present in this kind of algorithms. The inference speed of the design is evaluated over different resource constrained FPGA devices. Finally, the standard pruning algorithm is compared against a custom pruning technique specifically designed to exploit the scheduling properties of this hardware accelerator. We demonstrate that our hardware-aware pruning algorithm achieves a remarkable improvement of a 45 % in inference time compared to a network pruned using the standard algorithm.

Hardware Architecture,Artificial Intelligence

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address the following issues: 1. **Acceleration of Convolutional Neural Networks (CNNs) on Resource-Constrained Devices**: - For resource-constrained devices (such as low-power devices), a novel hardware accelerator architecture is proposed, based on a small reusable Systolic Array design. This architecture can be implemented on various FPGA devices and has been validated for its performance on ResNet-type neural networks. 2. **Hardware-Aware Pruning Method (HAPM)**: - A customized pruning technique called the Hardware-Aware Pruning Method (HAPM) is proposed. This method leverages the scheduling properties of the accelerator to optimize weight pruning in CNNs. Compared to standard pruning techniques, HAPM can significantly improve inference speed, reducing inference time by approximately 45%, while maintaining high accuracy. Through these techniques and methods, the paper demonstrates how to efficiently perform CNN inference tasks on resource-constrained hardware platforms.

HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

Single-shot Pruning and Quantization for Hardware-Friendly Neural Network Acceleration

A High-Speed CNN Hardware Accelerator with Regular Pruning

A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation

A High-Performance Hardware Accelerator for Sparse Convolutional Neural Network on FPGA

High Performance CNN Accelerators Based on Hardware and Algorithm Co-Optimization

3D CNN Acceleration on FPGA using Hardware-Aware Pruning

An Energy-Efficient Implementation of Group Pruned CNNs on FPGA

CNN Acceleration based on Dynamic Pruning and FPGAs Implementation

FPGA Accelerator for CNN: an Exploration of the Kernel Structured Sparsity and Hybrid Arithmetic Computation

DPACS: Hardware Accelerated Dynamic Neural Network Pruning Through Algorithm-Architecture Co-design.

WPU: A FPGA-based Scalable, Efficient and Software/Hardware Co-design Deep Neural Network Inference Acceleration Processor

A Mixed-Pruning Based Framework for Embedded Convolutional Neural Network Acceleration.

An Efficient FPGA Accelerator Optimized for High Throughput Sparse CNN Inference.

An algorithm/hardware co‐optimized method to accelerate CNNs with compressed convolutional weights on FPGA

HILP: hardware-in-loop pruning of convolutional neural networks towards inference acceleration

Hardware-Friendly 3D CNN Acceleration with Balanced Kernel Group Sparsity

High PE Utilization CNN Accelerator with Channel Fusion Supporting Pattern-Compressed Sparse Neural Networks

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

A Sparse CNN Accelerator for Eliminating Redundant Computations in Intra- and Inter-Convolutional/Pooling Layers