Improving Short-Duration Speaker Recognition by Joint Bark-Wavelet Acoustic Feature Coupling and Triplet Dual-Attention Mechanism Network

Yunfei Zi,Shengwu Xiong
DOI: https://doi.org/10.1007/s11277-024-11149-5
IF: 2.017
2024-05-08
Wireless Personal Communications
Abstract:Speaker recognition methods are negatively affected by the short duration of the input audio signal. In this paper, we address the problem of speaker recognition from short-duration speech data by coupling two proposed acoustic features: Bark-scaled Gaussian Filter Cepstral Coefficients (BGCC) and Perceptual Wavelet Packet Entropy (PWPE). Our approach is based on the observation that BGCC and PWPE capture comprehensive information related to speech, such as speech perception and high time–frequency representation. This information enhances the diversity of speaker characteristics and thus improves the accuracy of speaker discrimination. To effectively integrate these two features, we propose a Triplet Dual Attention Mechanism as a creative solution. By using this mechanism, the limited features extracted from short utterances can be reused, while simultaneously enhancing the discriminative features for improved performance in speaker recognition tasks with short-duration audio signals. Extensive analysis conducted on various datasets containing speech samples of different types and lengths confirms the superiority of our proposed feature engineering and method over existing acoustic feature extraction and speaker recognition algorithms. These include approaches based on MFCCs, LPCCs features, GMM-UBM, iVector-PLDA, and ResCNN-Triplet. The experimental results show that our proposed method achieves a significant improvement over existing approaches in the area of short-duration speaker recognition.
telecommunications
What problem does this paper attempt to address?