Hybrid Token Transformer for Deep Face Recognition

Weicong Su,Yali Wang,Kunchang Li,Peng Gao,Yu Qiao
DOI: https://doi.org/10.1016/j.patcog.2023.109443
IF: 8
2023-02-23
Pattern Recognition
Abstract:Although Convolutional Neural Networks have achieved remarkable successes in face recognition, they still suffer a critical limitation on capturing long range relations among facial regions. The recent vision transformers can naturally alleviate this problem, by learning global token dependencies. However, They are insufficient to discover high-level facial semantics since tokens in these transformers are based on small and fixed regions. To tackle such difficulty, we propose a novel Hybrid tOken Transformer (HOTformer) module to identify key facial semantics for effective recognition with cooperation of atomic and holistic tokens. Specifically, atomic tokens are generated from small fixed-size regions that can learn fine-grained core representation. Alternatively, holistic tokens are constructed from big adaptively-learned regions that can capture coarse-grained contextual representation. Furthermore, our HOTformer is a plug-and-play module. By hierarchically inserting it into convolutional networks, we can build a concise HOTformer-Net that achieves a preferable computation like CNN while boosting accuracy like transformer.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?