Double Linear Transformer for Background Music Generation from Videos

Xueting Yang,Ying Yu,Xiaoyu Wu
DOI: https://doi.org/10.3390/app12105050
2022-01-01
Abstract:Many music generation research works have achieved effective performance, while rarely combining music with given videos. We propose a model with two linear Transformers to generate background music according to a given video. To enhance the melodic quality of the generated music, we firstly input note-related and rhythm-related music features separately into each Transformer network. In particular, we pay attention to the connection and the independence of music features. Then, in order to generate the music that matches the given video, the current state-of-the-art cross-modal inference method is set up to establish the relationship between visual mode and sound mode. Subjective and objective experiment indicate that the generated background music matches the video well and is also melodious.
What problem does this paper attempt to address?