Popular Hooks: A Multimodal Dataset of Musical Hooks for Music Understanding and Generation

Xinda Wu,Jiaming Wang,Jiaxing Yu,Tieyao Zhang,Kejun Zhang
DOI: https://doi.org/10.1109/icmew63481.2024.10645427
2024-01-01
Abstract:The Internet is rich in unimodal music data, available in either symbolic or audio representations. However, there is a notable scarcity of multimodal music datasets that offer aligned modal information and comprehensive annotations for music understanding and generation. In this paper, we introduce Popular Hooks: a publicly accessible multimodal music dataset comprising 38,694 popular musical hooks (i.e., memorable sections of songs) with synchronized MIDI, music video, audio, and lyrics. Furthermore, the dataset provides detailed annotations of high-level musical attributes such as tonality, structure, genre, emotion, and region. Specifically, we leverage a pre-trained multimodal music emotion recognition framework for automatic emotion labeling and conduct a user study to assess its accuracy. Finally, we explore emotion-conditioned music generation baselines using this dataset, demonstrating its potential to advance the field.
What problem does this paper attempt to address?