Abstract:An automated process that can suggest a soundtrack to a user-generated video (UGV) and make the UGV a music-compliant professional-like video is challenging but desirable. To this end, this paper presents an automatic music video (MV) generation system that conducts soundtrack recommendation and video editing simultaneously. Given a long UGV, it is first divided into a sequence of fixed-length short (e.g., 2 seconds) segments, and then a multi-task deep neural network (MDNN) is applied to predict the pseudo acoustic (music) features (or called the pseudo song) from the visual (video) features of each video segment. In this way, the distance between any pair of video and music segments of same length can be computed in the music feature space. Second, the sequence of pseudo acoustic (music) features of the UGV and the sequence of the acoustic (music) features of each music track in the music collection are temporarily aligned by the dynamic time warping (DTW) algorithm with a pseudo-song-based deep similarity matching (PDSM) metric. Third, for each music track, the video editing module selects and concatenates the segments of the UGV based on the target and concatenation costs given by a pseudo-song-based deep concatenation cost (PDCC) metric according to the DTW-aligned result to generate a music-compliant professional-like video. Finally, all the generated MVs are ranked, and the best MV is recommended to the user. The MDNN for pseudo song prediction and the PDSM and PDCC metrics are trained by an annotated official music video (OMV) corpus. The results of objective and subjective experiments demonstrate that the proposed system performs well and can generate appealing MVs with better viewing and listening experiences.

Automatic Music Video Generation Based on Simultaneous Soundtrack Recommendation and Video Editing

audeosynth: music-driven video montage

Automatic Realistic Music Video Generation from Segments of Youtube Videos

A System For Automatic Generation Of Music Sports-Video

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

VidMusician: Video-to-Music Generation with Semantic-Rhythmic Alignment via Hierarchical Visual Features

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization

A Music-Driven System for Generating Apparel Display Video

DEMV-matchmaker: Emotional temporal course representation and deep similarity matching for automatic music video generation

Video Echoed in Harmony: Learning and Sampling Video-Integrated Chord Progression Sequences for Controllable Video Background Music Generation

Learning to Visualize Music Through Shot Sequence for Automatic Concert Video Mashup

Video Background Music Generation: Dataset, Method and Evaluation

MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space Binding

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

Music Conditioned Generation for Human-Centric Video

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Diff-BGM: A Diffusion Model for Video Background Music Generation

Automatic Generation of Interactive Nonlinear Video for Online Apparel Shopping Navigation

Language-Guided Music Recommendation for Video via Prompt Analogies

Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements