RPA-SCD: Rhythm and Pitch Aware Dual-Branch Network for Songs Conversion Detection

Mingshan Du,Hongxia Wang,Rui Zhang,Zihan Yan
DOI: https://doi.org/10.1109/ijcnn60899.2024.10651233
2024-01-01
Abstract:Song voice conversion tools have gained more and more popularity in the recent past. People have been uploading their self-made forgery songs on video websites, and these songs have been converted in timbre. However, singing voice conversion technology may cause copyright infringement of the songs. In order to protect the copyright of songs, the method of singing voice conversion detection needs to be investigated. We propose Rhythm and Pitch Aware Songs Conversion Detection (RPA-SCD), a dual-branch network for song voice conversion detection. RPA-SCD can predict forged song fragments through rhythm and pitch which are the global and local information of music. To evaluate the proposed method, we contribute a multilingual song conversion detection(MSCD) dataset. Our proposed model achieves the EER of 2.30% in the original domain of MSCD, which is lower than other benchmarks for speech forgery detection. The experiments show that our approach achieves state-of-the-art performance on the song conversion detection task. The MSCD dataset can be found at https://drive.google.com/file/d/1rFsvMYihVtk81uFbL7UpyUEs-qBgsX6H/view?usp=drive_link. The code can be found at https://github.com/Samantha-Du/RPA-SDD.
What problem does this paper attempt to address?