VocEmb4SVS: Improving Singing Voice Separation with Vocal Embeddings

Chenyi Li,Yi Li,Xuhao Du,Yaolong Ju,Shichao Hu,Zhiyong Wu
DOI: https://doi.org/10.23919/apsipaasc55919.2022.9980293
2022-01-01
Abstract:Deep learning-based methods have shown promising performance on singing voice separation (SVS). Recently, embeddings related to lyrics and voice activities have been proven effective to improve the performance of SVS tasks. However, embeddings related to singers have never been studied before. In this paper, we propose VocEmb4SVS, an SVS framework to utilize vocal embeddings of the singer as auxiliary knowledge for SVS conditioning. First, a pre-trained separation network is employed to obtain pre-separated vocals from the mixed music signals. Second, a vocal encoder is trained to extract vocal embeddings from the pre-separated vocals. Finally, the vocal embeddings are integrated into the separation network to improve SVS performance. Experimental results show that our proposed method achieves state-of-the-art performance on the MUSDB18 dataset with an SDR of 9.56 dB on vocals.
What problem does this paper attempt to address?