Visualizing Video Sounds With Sound Word Animation to Enrich User Experience

Fangzhou Wang,Hidehisa Nagano,Kunio Kashino,Takeo Igarashi
DOI: https://doi.org/10.1109/tmm.2016.2613641
IF: 7.3
2017-02-01
IEEE Transactions on Multimedia
Abstract:Sound information in videos plays an important role in shaping the user experience. When sound is not accessible in videos, text captions can provide sound information. However, conventional text captions are not very expressive for nonverbal sounds because they are designed to visualize speech sounds. Here, we present a framework to automatically transform nonverbal video sounds into animated sound words and position them near the sound source objects in the video for visualization. This provides natural visual representation of nonverbal sounds with rich information about the sound category and dynamics. To evaluate how the animated sound words generated by our framework affect the user experience, we implemented an experimental system and conducted a user study involving over 300 participants from an online crowdsourcing service. The results of the user study show that the animated sound words can effectively and naturally visualize the dynamics of sound while clarifying the position of the sound source as well as contribute to making video-watching more enjoyable and increasing the visual impact of videos.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?