Branch-ECAPA-TDNN: A Parallel Branch Architecture to Capture Local and Global Features for Speaker Verification

Jiadi Yao,Chengdong Liang,Zhendong Peng,Binbin Zhang,Xiao-Lei Zhang
DOI: https://doi.org/10.21437/interspeech.2023-402
2023-01-01
Abstract:Currently, ECAPA-TDNN is one of the state-of-the-art deep models for automatic speaker verification (ASV). However, it focuses too much on local feature extraction with fixed local ranges, without paying much attention to global feature extraction. To deal with this issue, in this paper, we propose Branch-ECAPA-TDNN, which uses two parallel branches to extract features with various ranges and abstract levels. One branch employs multi-head self-attention to capture long-range dependencies, while the other branch utilizes an SE-Res2Block module to model local multi-scale characteristics. To improve the feature fusion, we further apply different merging methods to aggregate features from both branches. Experimental results demonstrate that the proposed Branch-ECAPA-TDNN achieves a relative EER reduction of 24.10% and 7.92% over ECAPA-TDNN on the VoxCeleb and CN-Celeb datasets, respectively.
What problem does this paper attempt to address?