Abstract:In this paper, we introduce a novel method for detecting facial landmarks, termed the spatial fusion regression augmentation network (SFRA), which aims to improve the spatial generalization capabilities of coordinate based regression techniques. The SFRA consists of three components, namely, the landmark feature enhancement network (LFE), spatial relationship modeling network (SRM), and landmark coordinate regression (LCR) network. The LFE is capable of incorporating prior information into the extracted landmark features, thus boosting their representational capacity. The SRM is capable of understanding the local interaction and global interaction among the landmark patches before they are passed to the LCR to predict the coordinates of the landmarks. Then, the output from each predicted coordinates from the LCR feeds into the subsequent stage's LFE to improve feature representation, creating a pyramid-like multistage structure that refines details progressively from a coarse to a fine level, ultimately achieving precise facial landmark predictions. The proposed method was rigorously tested on the 300W, WFLW and COFW datasets, achieving normalized mean error (NME) metrics of 3.07%, 4.11% and 3.29% respectively, thus placing it to be the best performing coordinate based regression methods; its performance when compared with the best heatmap based regression methods, is mixed, in that on the WFLW dataset, it is better, on the 300W dataset it is slightly worse, and on the COFW dataset, it is better if the normalization factor in the NME metric is based on inter-ocular distances, while using inter-pupil distances, it is slightly worse than the best method in the heatmap based regression methods. Lastly, ablation studies are performed to confirm the efficacy of the proposed approach.

SRL: Separation-and-Recombination Learning for Video Facial Landmark Detection with Limited Data

Real-Time Facial Landmark Detection by Attention-driven Lightweight Network

FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment

SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

Facial Landmark Disentangled Network with Variational Autoencoder

Joint Structured Sparsity Regularized Multiview Dimension Reduction for Video-Based Facial Expression Recognition.

3-D Facial Landmarks Detection for Intelligent Video Systems

Exploiting Self-Supervised and Semi-Supervised Learning for Facial Landmark Tracking with Unlabeled Data

Adaptive random down-sampling data augmentation and area attention pooling for low resolution face recognition

Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models

Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation

Combining Data-driven and Model-driven Methods for Robust Facial Landmark Detection

SPL-Net: Spatial-Semantic Patch Learning Network for Facial Attribute Recognition with Limited Labeled Data

SFRA: spatial fusion regression augmentation network for facial landmark detection

Joint Super-Resolution and Alignment of Tiny Faces

Learning consistent region features for lifelong person re-identification

Multi-Label Dilated Recurrent Network for Sequential Face Alignment

Weakly-Supervised Multi-Face 3D Reconstruction

Low-Light Video Enhancement via Spatial-Temporal Consistent Illumination and Reflection Decomposition

Ssn3d: Self-Separated Network To Align Parts For 3d Convolution In Video Person Re-Identification

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition