Joint Stacked Hourglass Network and Salient Region Attention Refinement for Robust Face Alignment

Junfeng Zhang,Haifeng Hu,Guobin Shen
DOI: https://doi.org/10.1145/3374760
IF: 4.094
2020-01-01
ACM Transactions on Multimedia Computing Communications and Applications
Abstract:Facial landmark detection aims to locate keypoints for facial images, which typically suffer from variations caused by arbitrary pose, diverse facial expressions, and partial occlusion. In this article, we propose a coarse-to-fine framework that joins a stacked hourglass network and salient region attention refinement for robust face alignment. To achieve this goal, we first present a multi-scale region learning module to analyze the structure information at a different facial region and extract a strong discriminative deep feature. Then we employ a stacked hourglass network for heatmap regression and initial facial landmarks prediction. Specifically, the stacked hourglass network introduces an improved Inception-ResNet unit as a basic building block, which can effectively improve the receptive field and learn contextual feature representations. Meanwhile, a novel loss function takes into account global weights and local weights to make the heatmap regression more accurate. Different from existing heatmap regression models, we present a salient region attention refinement module to extract a precise feature based on the heatmap regression, and utilize the filtered feature for landmarks refinement to achieve accurate prediction. Extensive experimental results of several challenging datasets (including 300 Faces in the Wild, Caltech Occluded Faces in the Wild, and Annotated Facial Landmarks Faces in the Wild) confirm that our approach can achieve more competitive performance than the most advanced algorithms.
What problem does this paper attempt to address?