Dynamic Deformable Transformer for End‐to‐end Face Alignment

Liming Han,Chi Yang,Qing Li,Bin Yao,Zixian Jiao,Qianyang Xie
DOI: https://doi.org/10.1049/cvi2.12208
IF: 1.484
2023-01-01
IET Computer Vision
Abstract:Heatmap-based regression (HBR) methods have dominated for a long time in the face alignment field while these methods need complex design and post-processing. In this study, the authors propose an end-to-end and simple enough coordinate-based regression (CBR) method called Dynamic Deformable Transformer (DDT) for face alignment. Unlike general pre-defined landmark queries, DDT uses Dynamic Landmark Queries (DLQs) to query landmarks' classes and coordinates together. Besides, DDT adopts a deformable attention mechanism rather than a regular attention mechanism which has a faster convergence speed and lower computational complexity. Experiment results on three mainstream datasets 300W, WFLW, and COFW demonstrate DDT exceeds the state-of-the-art CBR methods by a large margin and is comparable to the current state-of-the-art HBR methods with much less computational complexity.
What problem does this paper attempt to address?