A Robust Occlusion-Adaptive Attention-Based Deep Network for Facial Landmark Detection
Sadiq, Muhammad,Shi, D.,Liang, Junwei
DOI: https://doi.org/10.1007/s10489-021-02848-8
IF: 5.3
2022-01-01
Applied Intelligence
Abstract:The Internet of Things (IoT) has extensively transformed the industry. The innovation of 5G technology and its rapid growth have enabled fast communication between IoT devices and the cyber domain. Technological advancement and the desire for ease of life have resulted in the development of the concept of smart cities. Security is one of the prime objectives of smart cities. The surveillance video management system is rapidly expanding its scope and applications. The use of 5G technology in smart cities enables the integration of real-time video observations with access to specific locations. This allows facial recognition to detect known criminals or a person of interest in a crowd. Facial landmark detection (FLD) is an essential step in facial attribute analysis, the face recognition pipeline, and face verification. Currently, researchers are focusing on convolutional neural network (CNN) based facial landmark detection approaches, and they have attained substantial advancement. However, occlusion is still the leading cause of difficulty impeding the ability of convolutional neural networks to achieve accurate results. Because attention plays a vital role in the human visual system, the significance regarding rich feature representation in computer vision problems has been recently proved by researchers. In this paper, an occlusion-adaptive attentive deep network (OADN) is proposed for facial landmark detection. In short, we extend our already well-established occlusion-adaptive deep network (ODN) by modifying the geometry-aware module (GM) and distillation module (DM). The results of our experiments show that our proposed model outperforms the current state-of-the-art methods on the available benchmark datasets. It reduces the error from 4.17 to 3.82 for the 300W Full-set dataset. After training on the Menpo dataset, the error decreases to 3.63, this is a 13% decrease in error compared to that of the ODN. In addition, we perform a statistical analysis with a 95% confidence interval to validate the effectiveness of our proposed methodology. Our method reduces the total number of network parameters from 6.6 million to 5.46 million, an approximately 16% decrease in network parameters, effectively reducing the training time and cost. Hence, it is more suitable for scalable data processing. Furthermore, taking advantage of our proposed model’s inherently low weight, we also propose a distributive facial recognition model for 5G camera-based surveillance systems.