Abstract:At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations of feature channels, making it possible to learn specialized features from each subset. Then, a novel Dynamic Semantic Specialization (DSS) model is designed to mine the homogeneous information from features at different scales for eliminating the semantic gap and ambiguities and enhancing the representation ability. Finally, by integrating the DSA model and DSS model into our proposed DSAT in both dynamic architecture and dynamic parameter manners, more specialized features can be learned for achieving more precise face alignment. It is interesting to show that harder samples can be handled by activating more feature channels. Extensive experiments on popular face alignment datasets demonstrate that our proposed DSAT outperforms state-of-the-art models in the <a class="link-external link-http" href="http://literature.Our" rel="external noopener nofollow">this http URL</a> code is available at <a class="link-external link-https" href="https://github.com/GERMINO-LiuHe/DSAT" rel="external noopener nofollow">this https URL</a>.

Multi-Scale Aggregation Network for Direct Face Alignment

Multi‐scale Cross‐domain Alignment for Person Image Generation

Selective Domain-Invariant Feature Alignment Network for Face Anti-Spoofing.

Multi-Attention Network for 2D Face Alignment in the Wild.

SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

Real-Time Facial Landmark Detection by Attention-driven Lightweight Network

Attentional Alignment Networks.

3D Dense Face Alignment with Fused Features by Aggregating CNNs and GCNs

Feature Agglomeration Networks for Single Stage Face Detection

Multi-scale Attention Guided Network for End-to-end Face Alignment and Recognition

Deep Multi-Center Learning for Face Alignment

Learning a Multi-Center Convolutional Network for Unconstrained Face Alignment.

Multi-image 3D Face Reconstruction Via an Adaptive Aggregation Network.

Cascaded Deep Convolutional Neural Network for Robust Face Alignment.

Multi-task Convolution Network for Face Alignment

Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer

Face Alignment Across Large Poses: A 3D Solution

Attention-Guided Coarse-to-Fine Network for 2D Face Alignment in the Wild

Towards Rich-Detail 3D Face Reconstruction and Dense Alignment Via Multi-Scale Detail Augmentation.

Multistage Model for Robust Face Alignment Using Deep Neural Networks

Deep Shape Constrained Network for Robust Face Alignment.