Abstract:Speaker verification performance degrades when input speech is tested in different sessions over a long period of time chronologically. Common ways to alleviate the long-term impact on performance degradation are enrollment data augmentation, speaker model adaptation, and adapted verification thresholds. From a point of view in features of a pattern recognition system, robust features that are speaker-specific, and invariant with time and acoustic environments are preferred to deal with this long-term variability. In this paper, with a newly created speech database, CSLT-Chronos, specially collected to reflect the long-term speaker variability, we investigate the issues in the frequency domain by emphasizing higher discrimination for speaker-specific information and lower sensitivity to time-related, session-specific information. F-ratio is employed as a criterion to determine the figure of merit to judge the above two sets of information, and to find a compromise between them. Inspired by the feature extraction procedure of the traditional MFCC calculation, two emphasis strategies are explored when generating modified acoustic features, the pre-filtering frequency warping and the post-filtering filter-bank outputs weighting are used for speaker verification. Experiments show that the two proposed features outperformed the traditional MFCC on CSLT-Chronos. The proposed approach is also studied by using the NIST SRE 2008 database in a state-of-the-art, i-vector based architecture. Experimental results demonstrate the advantage of proposed features over MFCC in LDA and PLDA based i-vector systems. (c) 2016 Elsevier B.V. All rights reserved.

Adaptive Large Margin Fine-Tuning for Robust Speaker Verification

VarASV: Enabling Pitch-variable Automatic Speaker Verification Via Multi-task Learning

Improving Speaker Verification Performance Against Long-Term Speaker Variability

The IDLAB VoxSRC-20 Submission: Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification

Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification

Fine-tune Pre-Trained Models with Multi-Level Feature Fusion for Speaker Verification

Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification

VOT: Revolutionizing Speaker Verification with Memory and Attention Mechanisms

Large Margin Softmax Loss for Speaker Verification

Margin-Mixup: A Method for Robust Speaker Verification in Multi-Speaker Audio

An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification

Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

Real Additive Margin Softmax for Speaker Verification

Deep Segment Attentive Embedding for Duration Robust Speaker Verification

Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

DeltaVLAD: an Efficient Optimization Algorithm to Discriminate Speaker Embedding for Text-Independent Speaker Verification

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

Learning from human perception to improve automatic speaker verification in style-mismatched conditions

A speaker verification backend with robust performance across conditions