End-to-End Spatial Transform Face Detection and Recognition

Hongxin Zhang,Liying Chi
DOI: https://doi.org/10.1016/j.vrih.2020.04.002
2020-01-01
Virtual Reality & Intelligent Hardware
Abstract:Plenty of face detection and recognition methods have been proposed and got excellent results in decades. Common face recognition pipeline consists of: 1) face detection, 2) face alignment, 3) feature extraction, 4) similarity calculation, which are separated and independent from each other. The separated face analyzing stages lead the model redundant calculation and are hard for end-to-end training. In this paper, we proposed a novel end-to-end trainable convolutional network framework for face detection and recognition, in which a geometric transformation matrix was directly learned to align the faces, instead of predicting the facial landmarks. In training stage, our single CNN model is supervised only by face bounding boxes and personal identities, which are publicly available from WIDER FACE [52] dataset and CASIA-WebFace [53] dataset. Tested on Face Detection Dataset and Benchmark (FDDB) [21] dataset and Labeled Face in the Wild (LFW) [19] dataset, we have achieved 89.24% recall for face detection task and 98.63% verification accuracy for face recognition task simultaneously, which are comparable to state-of-the-art results.
What problem does this paper attempt to address?