Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big.LITTLE asymmetric architectures

Alberto Corpas,Luis Costero,Guillermo Botella,Francisco D. Igual,Carlos García,Manuel Rodríguez
DOI: https://doi.org/10.1002/cta.2552
2024-02-06
Abstract:This paper proposes a mechanism to accelerate and optimize the energy consumption of a face detection software based on Haar-like cascading classifiers, taking advantage of the features of low-cost Asymmetric Multicore Processors (AMPs) with limited power budget. A modelling and task scheduling/allocation is proposed in order to efficiently make use of the existing features on big.LITTLE ARM processors, including: (I) source-code adaptation for parallel computing, which enables code acceleration by applying the OmpSs programming model, a task-based programming model that handles data-dependencies between tasks in a transparent fashion; (II) different OmpSs task allocation policies which take into account the processor asymmetry and can dynamically set processing resources in a more efficient way based on their particular features. The proposed mechanism can be efficiently applied to take advantage of the processing elements existing on low-cost and low-energy multi-core embedded devices executing object detection algorithms based on cascading classifiers. Although these classifiers yield the best results for detection algorithms in the field of computer vision, their high computational requirements prevent them from being used on these devices under real-time requirements. Finally, we compare the energy efficiency of a heterogeneous architecture based on asymmetric multicore processors with a suitable task scheduling, with that of a homogeneous symmetric architecture.
Performance
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to accelerate and optimize the execution speed and energy consumption of the face - detection software based on the Haar - like cascade classifier on the low - cost, low - power ARM big.LITTLE asymmetric architecture. Specifically, the goals of the paper are: 1. **Improve the execution efficiency of the face - detection algorithm**: By leveraging the parallel computing capabilities of multi - core processors, especially in view of the characteristics of the big.LITTLE architecture, optimize the execution speed of the face - detection algorithm. 2. **Reduce energy consumption**: Under the premise of ensuring detection accuracy, reduce the energy consumption during algorithm operation, so that the algorithm can achieve real - time processing on resource - limited embedded devices (such as mobile devices). 3. **Adaptive task scheduling and allocation**: Propose a task scheduling and allocation mechanism to fully utilize the characteristics of different cores in the big.LITTLE architecture and dynamically allocate computing resources to achieve the best performance and energy efficiency. To achieve these goals, the paper adopts the following methods: - **Adapt source code to parallel computing**: By applying the OmpSs programming model, transform the originally sequentially executed code into a form that supports parallel computing, thereby accelerating code execution. - **Different OmpSs task allocation strategies**: Dynamically adjust task allocation according to the asymmetry of the processor to ensure efficient utilization of the computing resources of each core. - **Experimental verification**: Conduct experiments on platforms such as Odroid XU4 and Raspberry Pi 3 B + to compare the differences in energy consumption and performance between heterogeneous asymmetric architectures and homogeneous symmetric architectures. Through these methods, the paper aims to provide an effective solution for realizing real - time face detection on low - cost, low - power devices.