Abstract:Singing voice conversion (SVC) automates song covers by converting a source singing voice from a source singer into a new singing voice with the same lyrics and melody as the source, but sounds like being covered by the target singer of some given target singing voices. However, it raises serious concerns about copyright and civil right infringements. We propose SongBsAb, the first proactive approach to tackle SVC-based illegal song covers. SongBsAb adds perturbations to singing voices before releasing them, so that when they are used, the process of SVC will be interfered, leading to unexpected singing voices. Perturbations are carefully crafted to (1) provide a dual prevention, i.e., preventing the singing voice from being used as the source and target singing voice in SVC, by proposing a gender-transformation loss and a high/low hierarchy multi-target loss, respectively; and (2) be harmless, i.e., no side-effect on the enjoyment of protected songs, by refining a psychoacoustic model-based loss with the backing track as an additional masker, a unique accompanying element for singing voices compared to ordinary speech voices. We also adopt a frame-level interaction reduction-based loss and encoder ensemble to enhance the transferability of SongBsAb to unknown SVC models. We demonstrate the prevention effectiveness, harmlessness, and robustness of SongBsAb on five diverse and promising SVC models, using both English and Chinese datasets, and both objective and human study-based subjective metrics. Our work fosters an emerging research direction for mitigating illegal automated song covers.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **Preventing the infringement of copyright and civil rights by illegal song covers based on Singing Voice Conversion (SVC)**. Specifically, with the development of generative AI, SVC technology enables people to generate new versions of songs that sound like they are sung by the target singer by converting the voice of the source singer. Although this technology lowers the threshold for covering songs, it also brings serious problems of copyright and civil rights infringement. For example, the virtual singer "AI Stefanie Sun" imitates the voice of the famous Chinese female singer Stefanie Sun and has covered more than 1,000 songs by other singers, far exceeding the number of works in Stefanie Sun's 23 - year career. In addition, some cover songs generated by SVC have received a great deal of attention on social media and have even been submitted as candidates for the Grammy Awards. However, these actions have raised serious concerns about the protection of copyright and civil rights. To address this problem, the author proposes **SongBsAb**, which is the first technical solution for actively defending against illegal SVC covers. The core idea of SongBsAb is to add subtle perturbations to the singing voice before releasing the song to interfere with the SVC process, so that the generated cover song cannot achieve the expected effect. Specifically, SongBsAb achieves double prevention in the following two ways: 1. **Identity Disruption**: - In order to prevent the singing voice from being used as the target singer's voice, SongBsAb designs a gender - transformation loss function, so that the voice generated by SVC no longer resembles the target singer's voice. This protects the target singer's performance rights and civil rights. 2. **Lyric Disruption**: - In order to prevent the singing voice from being used as the source singer's voice, SongBsAb designs a high/low hierarchy multi - target loss function, so that the voice generated by SVC contains ambiguous or different lyrics, thus protecting the copyright of the source song lyrics. In addition, in order to ensure that the perturbations do not affect the listener's appreciation experience of the protected song, SongBsAb also introduces a psychoacoustic model and a backing track as a masker to ensure that the perturbations are below the human auditory threshold and do not cause side effects. Overall, SongBsAb aims to fundamentally prevent the copyright and civil rights infringement problems brought by SVC technology through active defense while ensuring that the quality of the protected song is not affected.

SongBsAb: A Dual Prevention Approach against Singing Voice Conversion based Illegal Song Covers

RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion

PPG-based singing voice conversion with adversarial representation learning

UltraBD: Backdoor Attack against Automatic Speaker Verification Systems via Adversarial Ultrasound

Learning the Beauty in Songs: Neural Singing Voice Beautifier

SaMoye: Zero-shot Singing Voice Conversion Model Based on Feature Disentanglement and Enhancement

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion

Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Adversarial speech for voice privacy protection from Personalized Speech generation

A Cross-Modal Approach for Karaoke Artifacts Correction

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Robust One-Shot Singing Voice Conversion

DEFENDING YOUR VOICE: ADVERSARIAL ATTACK ON VOICE CONVERSION

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion

Self-Supervised Representations for Singing Voice Conversion

LHQ-SVC: Lightweight and High Quality Singing Voice Conversion Modeling

VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control