deepsing: Generating sentiment-aware visual stories using cross-modal music translation

Nikolaos Passalis,Stavros Doropoulos
DOI: https://doi.org/10.1016/j.eswa.2020.114059
IF: 8.5
2021-02-01
Expert Systems with Applications
Abstract:In this paper we propose a deep learning method for performing attributed-based music-to-image translation. The proposed method is applied for synthesizing visual stories according to the sentiment expressed by songs. The generated images aim to induce the same feelings to the viewers, as the original song does, reinforcing the primary aim of music, i.e., communicating feelings. The process of music-to-image translation poses unique challenges, mainly due to the unstable mapping between the different modalities involved in this process. In this paper, we employ a trainable cross-modal translation method to overcome this limitation, leading to the first, to the best of our knowledge, deep learning method for generating sentiment-aware visual stories. The proposed method was evaluated both quantitatively and qualitatively using a collection of songs that belong to 10 different genres, demonstrating that it is indeed possible to generate visual content that can match the sentiment expressed in songs. A user study was also conducted further validating the ability of the proposed method to provide sentiment-enriched visualizations.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?