Speaker Embedding Extraction with Multi-feature Integration Structure

Zheng Li,Hao Lu,Jianfeng Zhou,Lin Li,Qingyang Hong
DOI: https://doi.org/10.1109/APSIPAASC47483.2019.9023103
2019-01-01
Abstract:Recently x-vector has achieved a promising performance of speaker verification task and becomes one of the mainstream systems. In this paper, we analyzed the feature engineering based on the x-vector structure, and proposed a multi-feature integration method to further improve the feature representation of speaker characteristic. The proposed multi-feature integration method could be implemented in two ways, with the symmetric branches and the asymmetric branches, respectively, to incorporate different types of acoustic features in one neural network. While each branch processed one type of acoustic features on the frame level, the outputs of the two branches for each frame were spliced together as a super vector before being input into the statistics pooling layer. The experiments were executed on the VoxCeleb1 data set, and the results showed that the proposed multi-feature integration method obtained a 22.8% relative improvement over the baseline in EER value.
What problem does this paper attempt to address?