Lightweight facial landmark detection network based on improved MobileViT

Limei Song,Chuanfei Hong,Tian Gao,Jiali Yu
DOI: https://doi.org/10.1007/s11760-023-02975-4
IF: 1.583
2024-01-19
Signal Image and Video Processing
Abstract:The long-distance link between facial landmarks cannot be modeled by the current CNN-based facial landmark detection networks, and these networks typically have many parameters that consume substantial computational resources. This paper proposes a multi-scale lightweight facial landmark detection network with CNN and Transformer multi-branch parallelism. Based on MobileViT, the network incorporates MobileOne Block and simplified Ghost BottleNeck lightweight network structure. Compared to MobileViT on the WFLW dataset, the number of network parameters is reduced by 49.18%, the failure rate is reduced by 3.20%, the detection speed is improved by 41.73%, the FLOPS is reduced by 64.83%, and the NME is improved by 0.45% and 1.31% on the test and pose subsets, respectively. The data proves that the global information extraction of facial landmarks is more accurate after adding the Transformer structure. This paper also compares with other networks, and the result shows that improved MobileViT achieves more accurate detection with fewer model parameters.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?