Musical Elements Enhancement and Image Content Preservation Network for Image to Music Generation

Wenzhao Liu,Dechao Meng
DOI: https://doi.org/10.1109/BigData59044.2023.10386748
2023-12-15
Abstract:Image to music generation is a new task that has the potential to enhance the creative process in the fields of film, television, and game production. Due to the significant differences between images and music in terms of their modalities, there are considerable challenges in generating music from images. We introduce an image to music generation framework which can simultaneously maintain musicality and conform to the semantic content of the original image. It consists two paths, focusing on both musical element enhancement and image detail preservation. Our experiments show that the dual-path network does outperform our previous single-path model. Furthermore, our model demonstrates its ability to create music pieces of great diversity. We’ve set various catagorized musical terms for CLIP to match therefore enables the model to have more choices. Some generated music samples can be found in https://andyliu2008.github.io/image2music/
Computer Science
What problem does this paper attempt to address?