Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

Yuhua Xu,Jie Luo,Wei Sun
DOI: https://doi.org/10.3390/s24072239
IF: 3.9
2024-04-01
Sensors
Abstract:Convolutional neural networks (CNNs) have significantly advanced various fields; however, their computational demands and power consumption have escalated, posing challenges for deployment in low-power scenarios. To address this issue and facilitate the application of CNNs in power constrained environments, the development of dedicated CNN accelerators is crucial. Prior research has predominantly concentrated on developing low precision CNN accelerators using code generated from high-level synthesis (HLS) tools. Unfortunately, these approaches often fail to efficiently utilize the computational resources of field-programmable gate arrays (FPGAs) and do not extend well to full precision scenarios. To overcome these limitations, we integrate vector dot products to unify the convolution and fully connected layers. By treating the row vector of input feature maps as the fundamental processing unit, we balance processing latency and resource consumption while eliminating data rearrangement time. Furthermore, an accurate design space exploration (DSE) model is established to identify the optimal design points for each CNN layer, and dynamic partial reconfiguration is employed to maximize each layer's access to computational resources. Our approach is validated through the implementation of AlexNet and VGG16 on 7A100T and ZU15EG platforms, respectively. We achieve an average convolutional layer throughput of 28.985 GOP/s and 246.711 GOP/s for full precision. Notably, the proposed accelerator demonstrates remarkable power efficiency, with a maximum improvement of 23.989 and 15.376 times compared to current state-of-the-art FPGA implementations.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
The paper mainly addresses the application challenges of Convolutional Neural Networks (CNNs) in low-power scenarios and proposes a full-precision, low-power CNN accelerator named Flare based on Field Programmable Gate Arrays (FPGA) to solve the issues of low resource utilization and long processing delays in existing methods. Specifically, the paper addresses the following issues: 1. **Improving resource utilization and reducing processing delays**: Traditional FPGA-based designs often generate code through High-Level Synthesis (HLS) tools. While these methods can achieve rapid implementation, they usually cannot fully utilize the computational resources of the FPGA, especially in full-precision operations. Flare introduces a vector dot product to unify the computation modes of convolutional layers and fully connected layers and employs a Design Space Exploration (DSE) model to optimize configuration parameters, thereby improving resource utilization and reducing processing delays. 2. **Adapting to the needs of different layers**: The paper proposes a runtime reconfiguration method that allows each layer to maximize the use of computational resources. Additionally, dynamic partial reconfiguration technology is used to further enhance overall computational efficiency. 3. **Addressing bandwidth limitations**: Given the limited external memory bandwidth of FPGA platforms, Flare adopts caching strategies and data rearrangement techniques to reduce unnecessary data transfers and improve bandwidth utilization. In summary, Flare is an FPGA accelerator design scheme for full-precision CNNs, aiming to overcome the limitations of existing methods and achieve efficient and flexible CNN deployment in low-power environments.