DFR-ECAPA: Diffusion Feature Refinement for Speaker Verification Based on ECAPA-TDNN.

Ya Gao,Wei Song,Xiaobing Zhao,Xiangchun Liu
DOI: https://doi.org/10.1007/978-981-99-8549-4_38
2024-01-01
Abstract:Diffusion Probabilistic Models have gained significant recognition for their exceptional performance in generative image modeling. However, in the field of speech processing, a large number of diffusion-based studies focus on generative tasks such as speech synthesis and speech conversion, and few studies apply diffusion models to speaker verification. We investigated the integration of the diffusion model with the ECAPA-TDNN model. By constructing a dual-network branch architecture, the network further extracts and refines speaker embeddings under the guidance of the intermediate activations of the pre-trained DDPM. We put forward two methods for fusing network branch features, both of which demonstrated certain improvements. Furthermore, our proposed model also provides a new solution for semi-supervised cross-domain speaker verification. Experiments on Voxceleb and CN-Celeb show that DFR-ECAPA outperform origin ECAPA-TDNN by around 20%.
What problem does this paper attempt to address?