Abstract:Speaker verification performance degrades when input speech is tested in different sessions over a long period of time chronologically. Common ways to alleviate the long-term impact on performance degradation are enrollment data augmentation, speaker model adaptation, and adapted verification thresholds. From a point of view in features of a pattern recognition system, robust features that are speaker-specific, and invariant with time and acoustic environments are preferred to deal with this long-term variability. In this paper, with a newly created speech database, CSLT-Chronos, specially collected to reflect the long-term speaker variability, we investigate the issues in the frequency domain by emphasizing higher discrimination for speaker-specific information and lower sensitivity to time-related, session-specific information. F-ratio is employed as a criterion to determine the figure of merit to judge the above two sets of information, and to find a compromise between them. Inspired by the feature extraction procedure of the traditional MFCC calculation, two emphasis strategies are explored when generating modified acoustic features, the pre-filtering frequency warping and the post-filtering filter-bank outputs weighting are used for speaker verification. Experiments show that the two proposed features outperformed the traditional MFCC on CSLT-Chronos. The proposed approach is also studied by using the NIST SRE 2008 database in a state-of-the-art, i-vector based architecture. Experimental results demonstrate the advantage of proposed features over MFCC in LDA and PLDA based i-vector systems. (c) 2016 Elsevier B.V. All rights reserved.

Using Phoneme Recognition and Text-Dependent Speaker Verification to Improve Speaker Segmentation for Chinese Speech.

Research on Speaker-Depended Isolated-Word Speech Recognition System

A framework of text-dependent speaker verification for chinese numerical string corpus

A text-dependent speaker verification application framework based on Chinese numerical string corpus

Speaker Segmentation Based on Between-Window Correlation over Speakers' Characteristics

Exploring Sequential Characteristics in Speaker Bottleneck Feature for Text-Dependent Speaker Verification.

VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation.

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System

TRSD: A Time-Varying and Region-Changed Speech Database for Speaker Recognition

Speaker Segmentation Using Deep Speaker Vectors For Fast Speaker Change Scenarios

A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.

Asymmetric Clean Segments-Guided Self-Supervised Learning for Robust Speaker Verification

Automatic Segmentation for TTS Units

Deep Segment Attentive Embedding for Duration Robust Speaker Verification

Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

An Improved Speaker Based Speech Segmentation Algorithm

Improving Speaker Verification Performance Against Long-Term Speaker Variability

Refining phoneme segmentations using speaker-adaptive context dependent boundary models.

Robust Front-End for Speech Recognition Based on Computational Auditory Scene Analysis and Speaker Model

Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation

A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments