Prompt-guided Precise Audio Editing with Diffusion Models

Manjie Xu,Chenxing Li,Duzhen zhang,Dan Su,Wei Liang,Dong Yu
2024-05-11
Abstract:Audio editing involves the arbitrary manipulation of audio content through precise control. Although text-guided diffusion models have made significant advancements in text-to-audio generation, they still face challenges in finding a flexible and precise way to modify target events within an audio track. We present a novel approach, referred to as PPAE, which serves as a general module for diffusion models and enables precise audio editing. The editing is based on the input textual prompt only and is entirely training-free. We exploit the cross-attention maps of diffusion models to facilitate accurate local editing and employ a hierarchical local-global pipeline to ensure a smoother editing process. Experimental results highlight the effectiveness of our method in various editing tasks.
Sound,Artificial Intelligence,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of achieving precise control in audio editing. Specifically, existing text - guided diffusion models have made significant progress in generating high - quality audio, but these models still face difficulties when it is required to flexibly and precisely modify target events (such as replacing specific sounds in an audio clip). The paper proposes a new method - Prompt - guided Precise Audio Editing (PPAE), which aims to achieve precise editing of audio content only by using the input text prompts without additional training. This method utilizes the cross - attention maps in the diffusion model to promote accurate local editing and adopts a hierarchical local - global pipeline to ensure a smoother editing process. Experimental results show that this method performs well in a variety of editing tasks and can effectively achieve precise modification of audio content.