Soundify: Matching Sound Effects to Video

David Chuan-En Lin,Anastasis Germanidis,Cristóbal Valenzuela,Yining Shi,Nikolas Martelaro
DOI: https://doi.org/10.1145/3586183.3606823
2024-06-25
Abstract:In the art of video editing, sound helps add character to an object and immerse the viewer within a space. Through formative interviews with professional editors (N=10), we found that the task of adding sounds to video can be challenging. This paper presents Soundify, a system that assists editors in matching sounds to video. Given a video, Soundify identifies matching sounds, synchronizes the sounds to the video, and dynamically adjusts panning and volume to create spatial audio. In a human evaluation study (N=889), we show that Soundify is capable of matching sounds to video out-of-the-box for a diverse range of audio categories. In a within-subjects expert study (N=12), we demonstrate the usefulness of Soundify in helping video editors match sounds to video with lighter workload, reduced task completion time, and improved usability.
Sound,Computer Vision and Pattern Recognition,Human-Computer Interaction,Multimedia,Audio and Speech Processing
What problem does this paper attempt to address?
The paper aims to address the challenging issue of adding sound effects during the video editing process. Through interviews with 10 professional video editors, the authors found that adding sound effects to videos is a time-consuming and challenging task, especially as the amount of video footage increases. Therefore, the paper proposes the Soundify system, which aims to help video editors more easily match sound effects to videos. Specifically, the Soundify system implements the following functions: 1. **Surface**: Automatically finds relevant high-quality sound effects based on the video content. 2. **Synchronize**: Synchronizes the found sound effects with objects in the video, such as playing the corresponding sound effect when a bicycle appears. 3. **Spatial**: Dynamically adjusts the spatial properties of the sound effects (such as left-right channel balance and volume) to adapt to changes in the video content. 4. **Stack**: Allows editors to overlay multiple audio tracks to create a richer sound environment. To validate the effectiveness of Soundify, the authors conducted a large-scale human evaluation study (with a total of 889 participants). The results showed that Soundify outperformed baseline methods in matching sound effects and excelled in reducing workload, decreasing task completion time, and improving usability. Additionally, the authors conducted an expert study (with 12 professional video editors), further demonstrating the advantages of Soundify.