Mlp-svnet : a multi-layer perceptrons based network for speaker verification

Bing Han,Zhengyang Chen,Bei Liu,Yanmin Qian
DOI: https://doi.org/10.1109/ICASSP43922.2022.9747172
2022-01-01
Abstract:Convolution and self-attention based neural networks have both obtained excellent performance in automatic speaker verification. However, the convolution model often lacks the ability of long-term dependency modeling due to the limitation of receptive field, while the self-attention model is insufficient to model local information. To tackle this limitation, we propose a new multi-layer perceptrons based speaker verification network (MLP-SVNet) which can apply MLPs across temporal and frequency dimensions to capture the local and global information at the same time. The experimental results conducted on Voxceleb show that the proposed model is very competitive when compared to other systems based on convolution or self-attention. In addition, we demonstrate that MLP-SVNet based on multi-layer perceptrons can produce complementary embeddings, which can be fused with the state-of-the-art system to further improve the performance.
What problem does this paper attempt to address?