HiFi-GANw: Watermarked Speech Synthesis via Fine-Tuning of HiFi-GAN

Xiangyu Cheng,Yaofei Wang,Chang Liu,Donghui Hu,Zhaopin Su
DOI: https://doi.org/10.1109/lsp.2024.3456673
2024-09-21
IEEE Signal Processing Letters
Abstract:Advancements in speech synthesis technology bring generated speech closer to natural human voices, but they also introduce a series of potential risks, such as the dissemination of false information and voice impersonation. Therefore, it becomes significant to detect any potential misuse of the released speech content. This letter introduces an active strategy that combines audio watermarking with the HiFi-GAN vocoder to embed an invisible watermark in all synthesized speech for detection purposes. We first pre-train a watermark extraction network as the watermark extractor, and then use the watermark extraction loss and speech quality loss of the extractor to adjust the HiFi-GAN generator to ensure that the watermark can be extracted from the synthesized speech. We evaluate the imperceptibility and robustness of the watermark across various speech synthesis models. The experimental results demonstrate that our method effectively withstands various attacks and exhibits excellent imperceptibility. Moreover, our method is universal and compatible with various vocoder-based speech synthesis models.
engineering, electrical & electronic
What problem does this paper attempt to address?