Abstract:The purpose of heterogeneous face recognition (HFR) is to match face images of the same person from different modalities. Most HFR methods bridge the cross-modality variations with feature alignment by global feature representation learning, but ignore the content information of local features and modality-style information of face image for each modality, which limits the performance for HFR. The content information of local features not only contains the invariance of modality face features, but also can improve the stability of global face features, i.e., local features such as eyes, nose and mouth are steady and invariant. With this motivation, we propose a cross-modality dual-constraint (CMDC) approach that includes the part-facial relational attention network (PRAN) and modality-style attention network (MSAN). First, PRAN is designed to estimate the intrinsic structural relationships of local content features on each modality. It can extract discriminative local face features by capturing correlations within the face space of individual modality, and strengthen representations by contextual relationships across modalities. Secondly, we design the MSAN to capture the modality-style information for each modality, and then reduce the inter-modality differences by minimizing the distance of two modality-style features. Thirdly, to alleviate cross-modality variances and enhance intra-class compactness and inter-class divisibility, we propose the cross-modality dual-constrained loss (DCLoss) in the CMDC approach, which adds a global constraint to each sample distribution in the embedding space. Meanwhile, on the basis of focusing on modality-style information, DCLoss emphasizes the significance of category information. Extensive experiments on four datasets demonstrate the superior performance of our approach over the existing state-of-the-art. The code is available at https://github.com/JianYu777/CMDC.

Person Recognition with HGR Maximal Correlation on Multimodal Data.

Modality-transfer Generative Adversarial Network and Dual-Level Unified Latent Representation for Visible Thermal Person Re-Identification

Wearable Sensor Based Multimodal Human Activity Recognition Exploiting the Diversity of Classifier Ensemble.

HGR Correlation Pooling Fusion Framework for Recognition and Classification in Multimodal Remote Sensing Data

Feature Correlation Hypergraph: Exploiting High-order Potentials for Multimodal Recognition

A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition

Part-facial relational and modality-style attention networks for heterogeneous face recognition

Frame Aggregation and Multi-Modal Fusion Framework for Video-Based Person Recognition

A Maximal Correlation Embedding Method for Multilabel Human Context Recognition

Feature relationships hypergraph for multimodal recognition

A Multimodal Dynamic Hand Gesture Recognition Based on Radar–Vision Fusion

Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification

An Efficient Approach to Informative Feature Extraction from Multimodal Data

Audio-Visual Fusion Based on Interactive Attention for Person Verification

Human-centric multimodal fusion network for robust action recognition

Discovering attention-guided cross-modality correlation for visible–infrared person re-identification

A Hybrid Multimodal Fusion Framework for Semg-Acc-based Hand Gesture Recognition

HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition

Multi-Granularity Hypergraphs and Adversarial Complementary Learning for Person Re-identification.

HGR Maximal Correlation Augmented Cross-Modal Remote Sensing Retrieval

Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance