An Initial study on Birdsong Re-synthesis Using Neural Vocoders

Rhythm Bhatia,Tomi H. Kinnunen
DOI: https://doi.org/10.48550/arXiv.2209.10479
2022-09-22
Abstract:Modern speech synthesis uses neural vocoders to model raw waveform samples directly. This increased versatility has expanded the scope of vocoders from speech to other domains, such as music. We address another interesting domain of bio-acoustics. We provide initial comparative analysis-resynthesis experiments of birdsong using traditional (WORLD) and two neural (WaveNet autoencoder, parallel WaveGAN) vocoders. Our subjective results indicate no difference in the three vocoders in terms of species discrimination (ABX test). Nonetheless, the WORLD vocoder samples were rated higher in terms of retaining bird-like qualities (MOS test). All vocoders faced issues with pitch and voicing. Our results indicate some of the challenges in processing low-quality wildlife audio data.
Audio and Speech Processing,Sound,Signal Processing
What problem does this paper attempt to address?