TimbreCLIP: Connecting Timbre to Text and Images

Nicolas Jonason,Bob L.T. Sturm
DOI: https://doi.org/10.48550/arXiv.2211.11225
2022-11-21
Abstract:We present work in progress on TimbreCLIP, an audio-text cross modal embedding trained on single instrument notes. We evaluate the models with a cross-modal retrieval task on synth patches. Finally, we demonstrate the application of TimbreCLIP on two tasks: text-driven audio equalization and timbre to image generation.
Sound,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?