Abstract:Speaker recognition methods are negatively affected by the short duration of the input audio signal. In this paper, we address the problem of speaker recognition from short-duration speech data by coupling two proposed acoustic features: Bark-scaled Gaussian Filter Cepstral Coefficients (BGCC) and Perceptual Wavelet Packet Entropy (PWPE). Our approach is based on the observation that BGCC and PWPE capture comprehensive information related to speech, such as speech perception and high time–frequency representation. This information enhances the diversity of speaker characteristics and thus improves the accuracy of speaker discrimination. To effectively integrate these two features, we propose a Triplet Dual Attention Mechanism as a creative solution. By using this mechanism, the limited features extracted from short utterances can be reused, while simultaneously enhancing the discriminative features for improved performance in speaker recognition tasks with short-duration audio signals. Extensive analysis conducted on various datasets containing speech samples of different types and lengths confirms the superiority of our proposed feature engineering and method over existing acoustic feature extraction and speaker recognition algorithms. These include approaches based on MFCCs, LPCCs features, GMM-UBM, iVector-PLDA, and ResCNN-Triplet. The experimental results show that our proposed method achieves a significant improvement over existing approaches in the area of short-duration speaker recognition.

Robust Speaker Recognition Based on Multi-Stream Features

Glottal Information Based Spectral Recuperation in Multi-channel Speaker Recognition

Robust Speaker Recognition in Cross-Channel Condition

Robust Speaker Recognition in Cross-Channel Condition Based on Gaussian Mixture Model

Robust speaker recognition using glottal information‐based cepstral mean subtraction

Modified MFCCs for Robust Speaker Recognition

Improved multitaper PNCC feature for robust speaker verification

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

Speaker Recognition Using DMFCC over Telephone Channels

A Novel I-Vector Framework Using Multiple Features and PCA for Speaker Recognition in Short Speech Condition

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System.

Improving Short-Duration Speaker Recognition by Joint Bark-Wavelet Acoustic Feature Coupling and Triplet Dual-Attention Mechanism Network

Multi-resolution Time Frequency Feature and Complementary Combination for Short Utterance Speaker Recognition

Robust Front-End for Speech Recognition Based on Computational Auditory Scene Analysis and Speaker Model

Multi-feature Combination for Speaker Recognition

Noise Robust Speaker Recognition Based on Adaptive Frame Weighting in GMM for i-Vector Extraction.

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification

Hands-free speaker identification based on spectral subtraction using a multi-channel least mean square approach

Wav2sv: End-to-end Speaker Embeddings Learning from Raw Waveforms Based on Metric Learning for Speaker Verification.