Abstract:Recent research emphasizes more on analyzing multiple features to improve face recognition (FR) performance. One popular scheme is to extend the sparse representation based classification framework with various sparse constraints. Although these methods jointly study multiple features through the constraints, they just process each feature individually such that they overlook the possible high-level relationship among different features. It is reasonable to assume that the low-level features of facial images, such as edge information and smoothed/low-frequency image, can be fused into a more compact and more discriminative representation based on the latent high-level relationship. FR on the fused features is anticipated to produce better performance than that on the original features, since they provide more favorable properties. Focusing on this, we propose two different strategies which start from fusing multiple features and then exploit the dictionary learning (DL) framework for better FR performance. The first strategy is a simple and efficient two-step model, which learns a fusion matrix from training face images to fuse multiple features and then learns class-specific dictionaries based on the fused features. The second one is a more effective model requiring more computational time that learns the fusion matrix and the class-specific dictionaries simultaneously within an iterative optimization procedure. Besides, the second model considers to separate the shared common components from class-specified dictionaries to enhance the discrimination power of the dictionaries. The proposed strategies, which integrate multi-feature fusion process and dictionary learning framework for FR, realize the following goals: (1) exploiting multiple features of face images for better FR performances; (2) learning a fusion matrix to merge the features into a more compact and more discriminative representation; (3) learning class-specific dictionaries with consideration of the common patterns for better classification performance. We perform a series of experiments on public available databases to evaluate our methods, and the experimental results demonstrate the effectiveness of the proposed models.

DFR-ECAPA: Diffusion Feature Refinement for Speaker Verification Based on ECAPA-TDNN.

Integration of multi-feature fusion and dictionary learning for face recognition

PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification

Dual-model self-regularization and fusion for domain adaptation of robust speaker verification

DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification

NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification

Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions

Speaker recognition based on improved ECAPA-TDNN network

CRA-DIFFUSE: IMPROVED CROSS-DOMAIN SPEECH ENHANCEMENT BASED ON DIFFUSION MODEL WITH T-F DOMAIN PRE-DENOISING

Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

Revisiting Denoising Diffusion Probabilistic Models for Speech Enhancement: Condition Collapse, Efficiency and Refinement

DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer

VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model

Research on voiceprint Recognition system based on ECAPA-TDNN-GRU architecture

DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion

Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation