Dual Adversarial Disentanglement and Deep Representation Decorrelation for NIR-VIS Face Recognition

Weipeng Hu,Haifeng Hu
DOI: https://doi.org/10.1109/tifs.2020.3005314
IF: 7.231
2021-01-01
IEEE Transactions on Information Forensics and Security
Abstract:The task of near-infrared and visual (NIR-VIS) face recognition refers to matching face data from different modalities, which has broad application prospects in areas such as multimedia information retrieval and criminal investigation. However, it remains a challenging task due to high intra-class variations and small-scale NIR-VIS dataset. In this paper, we propose a novel approach called Dual Adversarial Disentanglement and deep Representation Decorrelation (DADRD) to solve the NIR-VIS matching problem. In order to reduce the gap between NIR-VIS images, three key components are designed for DADRD model, including Cross-modal Margin (CmM) loss, Dual Adversarial Disentangled Variations (DADV) and Deep Representation Decorrelation (DRD). Firstly, the CmM loss captures within- and between-class information of the data, and it further reduces modality difference by a center-variation item. Secondly, the Mixed Facial Representation (MFR) layer of the backbone network is divided into three parts: the identity-related layer, the modality-related layer and the residual-related layer. The DADV is designed to reduce the intra-class variations, which consists of Adversarial Disentangled Modality Variations (ADMV) and Adversarial Disentangled Residual Variations (ADRV). Specifically, the ADMV and ADRV aim at eliminating spectrum variations and residual variations (i.e., lighting, pose, expression, occlusion, etc) respectively via an adversarial mechanism. Finally, we impose a DRD on the three decomposed features to make them irrelevant to each other, which can more effectively separate the three component information and enhance feature representations. In particular, we develop a Joint Three-stage Optimization (JTsO) strategy to effectively optimize the network. The joint formulation leads to the purification of identity information and the disentanglement of within-class variation information. Extensive experiments have been carried out on three challenging datasets, and the results demonstrate the effectiveness of our method.
What problem does this paper attempt to address?