Abstract:Speaker verification performance degrades when input speech is tested in different sessions over a long period of time chronologically. Common ways to alleviate the long-term impact on performance degradation are enrollment data augmentation, speaker model adaptation, and adapted verification thresholds. From a point of view in features of a pattern recognition system, robust features that are speaker-specific, and invariant with time and acoustic environments are preferred to deal with this long-term variability. In this paper, with a newly created speech database, CSLT-Chronos, specially collected to reflect the long-term speaker variability, we investigate the issues in the frequency domain by emphasizing higher discrimination for speaker-specific information and lower sensitivity to time-related, session-specific information. F-ratio is employed as a criterion to determine the figure of merit to judge the above two sets of information, and to find a compromise between them. Inspired by the feature extraction procedure of the traditional MFCC calculation, two emphasis strategies are explored when generating modified acoustic features, the pre-filtering frequency warping and the post-filtering filter-bank outputs weighting are used for speaker verification. Experiments show that the two proposed features outperformed the traditional MFCC on CSLT-Chronos. The proposed approach is also studied by using the NIST SRE 2008 database in a state-of-the-art, i-vector based architecture. Experimental results demonstrate the advantage of proposed features over MFCC in LDA and PLDA based i-vector systems. (c) 2016 Elsevier B.V. All rights reserved.

Contrastive Predictive Coding Based Feature for Automatic Speaker Verification

Maximum Likelihood I-Vector Space Using PCA for Speaker Verification.

Regularizing Contrastive Predictive Coding for Speech Applications

Speaker Contrastive Learning for Source Speaker Tracing

Slowness Regularized Contrastive Predictive Coding for Acoustic Unit Discovery

Contrastive Learning for improving End-to-end Speaker Verification

Introducing Phonetic Information to Speaker Embedding for Speaker Verification

Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification

A PCA Method Based on Speaker Session Variability

Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification

PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

New Adaptation Method Using Two-Dimensional Pca for Speaker Verification

Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning

Contrastive Positive Sample Propagation along the Audio-Visual Event Line

LPCSE: Neural Speech Enhancement through Linear Predictive Coding

Speaker Verification With Deep Features

S2VC - A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

Speaker Verification Based on TES-PCA Classifier and SVM plus FCM Clustering.

Improving Speaker Verification Performance Against Long-Term Speaker Variability