Abstract:Near-infrared and visual (NIR-VIS) face matching, as the most typical task in Heterogeneous Face Recognition (HFR), has attracted increasing attention in recent years. However, due to the large within-class discrepancies, including domain differences and residual discrepancies (i.e., lighting, expressions, occlusion, blurry, pose, etc), this is still a difficult task. Conventional NIR-VIS FR methods only focus on reducing the modality gap between cross-domain images, while neglecting to eliminate the residual variations. To better solve the above problems, this paper proposes a novel Orthogonal Modality Disentanglement and Representation Alignment (OMDRA) approach, which consists of three key components, including Modality-Invariant (MI) loss, Orthogonal Modality Disentanglement (OMD) and Deep Representation Alignment (DRA). Firstly, the MI loss is designed to learn modality-invariant and identity-discriminative representation, by increasing between-class separability and within-class compactness between NIR and VIS heterogeneous data. Secondly, the high-level Hybrid Facial Feature (HFF) layer of the backbone network is projected into two subspaces: the modality-related and identity-related subspaces. The OMD is designed to decouple modal information via an adversarial process, and we further impose Orthogonal Representation Decorrelation (ORD) to the OMD to decrease the correlation between identity representations and domain representations, as well as enhancing their representation capabilities. Finally, the DRA aims to eliminate the residual variations by performing a high-level representation alignment between non-neutral face and neutral face, which can effectively guides the network to learn discriminative and residual-invariant face representation. The joint scheme enables the disentanglement of modality variations, elimination of residual discrepancies, and the purification of identity information. Extensive experiments on challenging cross-domain databases-indicate that our OMDRA method is superior to the state-of-the-art methods.

Cross-Modal and Multi-Attribute Face Recognition: A Benchmark

Towards Mask-robust Face Recognition.

Near-infrared and visible light face recognition: a comprehensive survey

A NIR-to-VIS face recognition via part adaptive and relation attention module

Cross-spectral Face Completion for NIR-VIS Heterogeneous Face Recognition

Orthogonal Modality Disentanglement and Representation Alignment Network for NIR-VIS Face Recognition

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation

Mutimodal Ranking Optimization for Heterogeneous Face Re-identification

Adversarial Cross-Spectral Face Completion for NIR-VIS Face Recognition

Flexible-Modal Face Anti-Spoofing: A Benchmark

Cross-Modal Object Tracking: Modality-Aware Representations and a Unified Benchmark

Voice-Face Cross-modal Matching and Retrieval: A Benchmark

Not Afraid of the Dark: NIR-VIS Face Recognition via Cross-spectral Hallucination and Low-rank Embedding

Robust Face Recognition via Multimodal Deep Face Representation

Heterogeneous Visible-Thermal and Visible-Infrared Face Recognition Using Cross-Modality Discriminator Network and Unit-Class Loss

A Bidirectional Conversion Network for Cross-Spectral Face Recognition

A Good Practice Towards Top Performance of Face Recognition: Transferred Deep Feature Fusion

Cross-directional consistency network with adaptive layer normalization for multi-spectral vehicle re-identification and a high-quality benchmark

Cross-Spectral Attention for Unsupervised RGB-IR Face Verification and Person Re-identification

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

Multi-spectral Vehicle Re-identification with Cross-directional Consistency Network and a High-quality Benchmark