Leveraging Self-Supervised Learning for Accurate Facial Keypoint Detection in Thermal Images
Zahra Bahmani,Poorya Aghaomidi,Sadaf Aram
DOI: https://doi.org/10.1109/ICBME61513.2023.10488499
2023-11-30
Abstract:In the field of computer vision, the pursuit of precise keypoint detection is constantly evolving, finding applications in diverse domains such as face recognition, facial region tracking, and facial expression analysis. While extensive research has been directed towards the visual spectrum, the potential of infrared imaging, rich in physiological cues, remains largely untapped due to limited annotated datasets. The thermal domain offers a unique avenue to extract intricate indicators of both mental and physical states in humans. A key component of autonomous mental state recognition is a robust face tracker capable of pinpointing facial landmarks with utmost precision. In this pursuit, we present a meticulously designed algorithm tailored for accurate facial keypoint detection in thermal images. We harnessed a dataset manually annotated with 68 facial keypoints across 94 subjects, providing a solid foundation for our exploration. To overcome the challenges posed by limited samples, we harnessed the cutting-edge paradigm of self-supervised learning. Guided by this approach, our Convolutional Neural Network (CNN) underwent comprehensive pretraining with three pretext tasks: image rotation prediction, subject classification, and 5-point facial keypoint detection. Notably, these tasks achieved high accuracies of 100%, 97.92%, and 2.37% mean absolute percentage error, respectively. The intrinsic knowledge distilled through these tasks enriched our network’s comprehension of thermal facial data, effectively unraveling distinctive features inherent to each countenance. With a fine-tuned CNN, we ventured into the domain of keypoint detection, strategically relinquishing the fully connected layers. This transformation, fortified by the insights gleaned from self-supervised pretraining, resulted in a discernible boost in keypoint detection precision. Demonstrating the efficacy of our approach, the achieved Normalized Mean Error (NME) stood at an impressive 1.05, distinctly surpassing the NME recorded through traditional fully supervised learning (NME=3.19). This outcome underscores the prowess of self-supervised learning in elevating the accuracy of keypoint detection, aligning with the broader theme of our work: capturing the intrinsic cues that guide the path to enhanced facial analysis.
Computer Science,Engineering