SongBsAb: A Dual Prevention Approach against Singing Voice Conversion based Illegal Song Covers

Guangke Chen,Yedi Zhang,Fu Song,Ting Wang,Xiaoning Du,Yang Liu
2024-12-01
Abstract:Singing voice conversion (SVC) automates song covers by converting a source singing voice from a source singer into a new singing voice with the same lyrics and melody as the source, but sounds like being covered by the target singer of some given target singing voices. However, it raises serious concerns about copyright and civil right infringements. We propose SongBsAb, the first proactive approach to tackle SVC-based illegal song covers. SongBsAb adds perturbations to singing voices before releasing them, so that when they are used, the process of SVC will be interfered, leading to unexpected singing voices. Perturbations are carefully crafted to (1) provide a dual prevention, i.e., preventing the singing voice from being used as the source and target singing voice in SVC, by proposing a gender-transformation loss and a high/low hierarchy multi-target loss, respectively; and (2) be harmless, i.e., no side-effect on the enjoyment of protected songs, by refining a psychoacoustic model-based loss with the backing track as an additional masker, a unique accompanying element for singing voices compared to ordinary speech voices. We also adopt a frame-level interaction reduction-based loss and encoder ensemble to enhance the transferability of SongBsAb to unknown SVC models. We demonstrate the prevention effectiveness, harmlessness, and robustness of SongBsAb on five diverse and promising SVC models, using both English and Chinese datasets, and both objective and human study-based subjective metrics. Our work fosters an emerging research direction for mitigating illegal automated song covers.
Sound,Artificial Intelligence,Cryptography and Security,Machine Learning,Multimedia,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Preventing the infringement of copyright and civil rights by illegal song covers based on Singing Voice Conversion (SVC)**. Specifically, with the development of generative AI, SVC technology enables people to generate new versions of songs that sound like they are sung by the target singer by converting the voice of the source singer. Although this technology lowers the threshold for covering songs, it also brings serious problems of copyright and civil rights infringement. For example, the virtual singer "AI Stefanie Sun" imitates the voice of the famous Chinese female singer Stefanie Sun and has covered more than 1,000 songs by other singers, far exceeding the number of works in Stefanie Sun's 23 - year career. In addition, some cover songs generated by SVC have received a great deal of attention on social media and have even been submitted as candidates for the Grammy Awards. However, these actions have raised serious concerns about the protection of copyright and civil rights. To address this problem, the author proposes **SongBsAb**, which is the first technical solution for actively defending against illegal SVC covers. The core idea of SongBsAb is to add subtle perturbations to the singing voice before releasing the song to interfere with the SVC process, so that the generated cover song cannot achieve the expected effect. Specifically, SongBsAb achieves double prevention in the following two ways: 1. **Identity Disruption**: - In order to prevent the singing voice from being used as the target singer's voice, SongBsAb designs a gender - transformation loss function, so that the voice generated by SVC no longer resembles the target singer's voice. This protects the target singer's performance rights and civil rights. 2. **Lyric Disruption**: - In order to prevent the singing voice from being used as the source singer's voice, SongBsAb designs a high/low hierarchy multi - target loss function, so that the voice generated by SVC contains ambiguous or different lyrics, thus protecting the copyright of the source song lyrics. In addition, in order to ensure that the perturbations do not affect the listener's appreciation experience of the protected song, SongBsAb also introduces a psychoacoustic model and a backing track as a masker to ensure that the perturbations are below the human auditory threshold and do not cause side effects. Overall, SongBsAb aims to fundamentally prevent the copyright and civil rights infringement problems brought by SVC technology through active defense while ensuring that the quality of the protected song is not affected.