A Resource-efficient Lip Detector Based on Hybrid Feature Criteria and Predictive Bounding Box Tracking for Mobile HCI Applications

Kaiji Liu,Siqi Zhang,Syed Muhammad Abubakar,Nan Wu,Yanshu Guo,Zhihua Wang,Hanjun Jiang
DOI: https://doi.org/10.1109/mwscas60917.2024.10658691
2024-01-01
Abstract:Lip detecion(LD) holds great promise for mobile human-computer interaction(HCI) terminals such as hearing aids, robots, smartphones etc. However they suffer from the mas-sive computation and resource overhead from mainstream models as Viola-Jones framework, convolutional neural networks(CNN), recurrent neural networks(RNN) and vision transformer(ViT). To solve this problem, we propose a resource-efficient lip de-tector(RELD) for mobile HCI applications. For lip region of interest(ROI) detection, a hybrid feature criteria is constructed utilizing the hump-like curve formed by row-summation of lip ROI. And a L-order predictive tracking method is proposed to track the lip bounding box in conitnuous image flows with low computation and latency. For behavioural validation, RELD ahieves test accuracy over 95% on a database of 204000 images generated from GRID dataset. To verify its hardware feasibil-ity, an RTL implementation has been accomplished based on 200 × 200 images read from OV7670 image sensor, showing that RELD requires only 352 bytes of SRAM and: ≤; 5000 MAC operations per frame to perform lip detection task.
What problem does this paper attempt to address?