AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos With Deep Learning

Sanchita Ghose,John Jeffrey Prevost
DOI: https://doi.org/10.1109/tmm.2020.3005033
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:In movie productions, the Foley artist is responsible for creating an overlay soundtrack that helps the movie come alive for the audience. This requires the artist identify sounds that enhance the experience for the listener, reinforcing the director's intention for the scene. The artist must decide what artificial sound captures the essence of the sound and action depicted in the scene. In this paper, we present AutoFoley, an automated deep-learning tool that is used to synthesize a representative audio track for videos. AutoFoley is used to associate audio files with soundless video or to identify critical scenarios and provide a synthesized, reinforced and time-synchronized soundtrack. Our algorithm is capable of precise recognition of actions as well as interframe relations in fast- moving video clips through incorporating interpolation technique and temporal relational networks (TRN). We employ a robust multiscale recurrent neural network (RNN) and a convolutional neural network (CNN) for better understanding of the intricate input-to-output associations. To evaluate AutoFoley, we create an audio-video dataset containing a variety of sounds frequently used as Foley effects in movies. While the Foley dataset was limited to short-duration videos off the representative activities, this dataset demonstrates the capabilities of our proposed system. We show the synthesized sounds are portrayed with accurate temporal synchronization of the associated visual inputs. Human qualitative testing of AutoFoley shows more than 73% of the test subjects considered the generated soundtrack as original, which is a noteworthy improvement in comparable cross-modal research.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?