Spectrogram Inpainting for Interactive Generation of Instrument Sounds

Théis Bazin,Gaëtan Hadjeres,Philippe Esling,Mikhail Malt
DOI: https://doi.org/10.30746/978-91-519-5560-5
2021-04-15
Abstract:Modern approaches to sound synthesis using deep neural networks are hard to control, especially when fine-grained conditioning information is not available, hindering their adoption by musicians. In this paper, we cast the generation of individual instrumental notes as an inpainting-based task, introducing novel and unique ways to iteratively shape sounds. To this end, we propose a two-step approach: first, we adapt the VQ-VAE-2 image generation architecture to spectrograms in order to convert real-valued spectrograms into compact discrete codemaps, we then implement token-masked Transformers for the inpainting-based generation of these codemaps. We apply the proposed architecture on the NSynth dataset on masked resampling tasks. Most crucially, we open-source an interactive web interface to transform sounds by inpainting, for artists and practitioners alike, opening up to new, creative uses.
Sound,Artificial Intelligence,Human-Computer Interaction,Audio and Speech Processing
What problem does this paper attempt to address?