Triplet Convolutional Network for Music Version Identification.

Xiaoyu Qi,Deshun Yang,Xiaoou Chen
DOI: https://doi.org/10.1007/978-3-319-73603-7_44
2018-01-01
Abstract:Music version identification has long been a difficult task in the music information retrieval field, due to the variations in tempo, key and structure. Most existing methods use hand-crafted features, which require extensive human efforts and expert participants to design the feature structures and further breakthrough is hardly achievable. Therefore, we propose a triplet convolutional embedding network for version identification, learning feature representations for music automatically in a supervised way. Triplet convolutional networks can learn segment-level features from training data, focusing on the most similar parts between music versions, rather than on the song-level. Furthermore, we compare triplet-based learning with pair-based learning. Our approach has two main advantages over existing ones: (1) Music features are embedded in an automatic and supervised way and the architecture is more promising as the music data keeps expanding; (2) Feature embedding on segment-level is more precise since the query audio can be any identifiable segment of a music work and the audio can have different lengths. Extensive experiments demonstrate the effectiveness of our method.
What problem does this paper attempt to address?