Accelerating local laplacian filters on FPGAs

Shashwat Khandelwal,Ziaul Choudhury,Shashwat Shrivastava,Suresh Purini
2024-02-18
Abstract:Images when processed using various enhancement techniques often lead to edge degradation and other unwanted artifacts such as halos. These artifacts pose a major problem for photographic applications where they can denude the quality of an image. There is a plethora of edge-aware techniques proposed in the field of image processing. However, these require the application of complex optimization or post-processing methods. Local Laplacian Filtering is an edge-aware image processing technique that involves the construction of simple Gaussian and Laplacian pyramids. This technique can be successfully applied for detail smoothing, detail enhancement, tone mapping and inverse tone mapping of an image while keeping it artifact-free. The problem though with this approach is that it is computationally expensive. Hence, parallelization schemes using multi-core CPUs and GPUs have been proposed. As is well known, they are not power-efficient, and a well-designed hardware architecture on an FPGA can do better on the performance per watt metric. In this paper, we propose a hardware accelerator, which exploits fully the available parallelism in the Local Laplacian Filtering algorithm, while minimizing the utilization of on-chip FPGA resources. On Virtex-7 FPGA, we obtain a 7.5x speed-up to process a 1 MB image when compared to an optimized baseline CPU implementation. To the best of our knowledge, we are not aware of any other hardware accelerators proposed in the research literature for the Local Laplacian Filtering problem.
Image and Video Processing,Computer Vision and Pattern Recognition,Graphics,Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when using various enhancement techniques in image processing, edge degradation and other unwanted artifacts (such as halos) are often caused. These artifacts are particularly detrimental to photographic applications because they reduce image quality. Although many edge - aware techniques have been proposed to avoid these problems, these techniques usually require complex optimization or post - processing methods, resulting in high computational costs. In particular, although the Local Laplacian Filtering (LLF) algorithm can effectively avoid artifacts, its computational complexity is high and it is difficult to process large - scale images in real - time. To solve this problem, the paper proposes a hardware accelerator based on FPGA (Field - Programmable Gate Array), aiming to fully utilize the parallelism in the LLF algorithm while minimizing the use of FPGA resources. Specifically, this accelerator can achieve a 7.5 - fold speed - up on the Virtex - 7 FPGA and has a significant performance improvement compared to the optimized CPU implementation when processing 1 - MB images. In addition, this accelerator also performs well in terms of power consumption and has a higher performance - per - watt metric. The main contributions of the paper include: 1. **Hardware architecture design**: A new parallel architecture is proposed, which can fully utilize the data parallelism and pipeline parallelism in the LLF algorithm. 2. **Convolution engine optimization**: By modifying the Gaussian filter, an efficient convolution engine is designed, which can achieve high - throughput convolution operations using only LUT resources without using the DSP blocks of the FPGA. 3. **Remapping function optimization**: The calculation of the remapping function is converted into a look - up - table operation, thus saving a large amount of hardware resources and eliminating the associated computational latency. 4. **Experimental verification**: The performance and accuracy of this accelerator are verified through a series of experiments. The results show that it can significantly increase the processing speed while maintaining high image quality (PSNR values between 30 and 50 dB) when processing images of different sizes. In conclusion, through hardware acceleration technology, this paper effectively solves the computational bottleneck problem of the LLF algorithm in practical applications and provides a new solution for real - time image processing.