A Deep Neural Network for Modeling Music

Pengjing Zhang,Xiaoqing Zheng,Wenqiang Zhang,Siyan Li,Sheng Qian,Wenqi He,Shangtong Zhang,Ziyuan Wang
DOI: https://doi.org/10.1145/2671188.2749367
2015-01-01
Abstract:We propose a convolutional neural network architecture with k-max pooling layer for semantic modeling of music. The aim of a music model is to analyze and represent the semantic content of music for purposes of classification, discovery, or clustering. The k-max pooling layer is used in the network to make it possible to pool the k most active features, capturing the semantic-rich and time-varying information about music. Our network takes an input music as a sequence of audio words, where each audio word is associated with a distributed feature vector that can be fine-tuned by backpropagating errors during the training. The architecture allows us to take advantage of the better trained audio word embeddings and the deep structures to produce more robust music representations. Experiment results with two different music collections show that our neural networks achieved the best accuracy in music genre classification comparing with three state-of-art systems.
What problem does this paper attempt to address?