Abstract:This work aims to further compensate for the weaknesses of feature sparsity and insufficient discriminative acoustic features in existing short-duration speaker recognition. To address this issue, we propose the Bark-scaled Gauss and the linear filter bank superposition cepstral coefficients (BGLCC), and the multidimensional central difference (MDCD) acoustic feature extracted method. The Bark-scaled Gauss filter bank focuses on low-frequency information, while linear filtering is uniformly distributed, therefore, the filter superposition can obtain more discriminative and richer acoustic features of short-duration audio signals. In addition, the multi-dimensional central difference method captures better dynamics features of speakers for improving the performance of short utterance speaker verification. Extensive experiments are conducted on short-duration text-independent speaker verification datasets generated from the VoxCeleb, SITW, and NIST SRE corpora, respectively, which contain speech samples of diverse lengths, and different scenarios. The results demonstrate that the proposed method outperforms the existing acoustic feature extraction approach by at least 10% in the test set. The ablation experiments further illustrate that our proposed approaches can achieve substantial improvement over prior methods.

Speaker Characterization Using Spectral Subband Energy Ratio Based on Harmonic Plus Noise Model

Reliability detection by Fuzzy SVM with UBM Component feature for emotional speaker recognition

Glottal Information Based Spectral Recuperation in Multi-channel Speaker Recognition

Robust Feature Based on Speech Harmonic Structure for Speaker Identification

Prosodic Features-Based Speaker Verification Using Speaker-Specific-text for Short Utterances.

Speaker Verification Based on Prosodic Features

Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance.

Robust FHPD Features from Speech Harmonic Analysis for Speaker Identification

Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

Using Subband Mel-spectrum Centroid and Gaussian Mixture Correlation for Robust Speaker Identification

Feature Extraction and Test Algorithm for Speaker Verification

Non-negative matrix factorization based discriminative features for speaker verification

Auditory model-based speech feature extraction and its application to speaker identification

Exploiting Prosodic Information for Speaker Recognition

Recuperating Spectral Features Using Glottal Information And Its Application To Speaker Recognition

Speaker Verification Using Simple Temporal Features and Pitch Synchronous Cepstral Coefficients

Subspace construction and selection for speaker recognition

Robust Speaker Identification Using An Auditory-Based Feature

Extracting Supra-Segment Information for Text-Independent Speaker Verification

Study on Speaker Verification on Emotional Speech