From West to East: Who can understand the music of the others better?

Charilaos Papaioannou,Emmanouil Benetos,Alexandros Potamianos

2023-07-19

Abstract:Recent developments in MIR have led to several benchmark deep learning models whose embeddings can be used for a variety of downstream tasks. At the same time, the vast majority of these models have been trained on Western pop/rock music and related styles. This leads to research questions on whether these models can be used to learn representations for different music cultures and styles, or whether we can build similar music audio embedding models trained on data from different cultures or styles. To that end, we leverage transfer learning methods to derive insights about the similarities between the different music cultures to which the data belongs to. We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music. Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset. Experimental results show that competitive performance is achieved in all domains via transfer learning, while the best source dataset varies for each music culture. The implementation and the trained models are both provided in a public repository.

Sound,Computer Vision and Pattern Recognition,Machine Learning,Audio and Speech Processing

What problem does this paper attempt to address?

The problem this paper attempts to address is whether existing deep learning models in the field of Music Information Retrieval (MIR) can effectively learn and transfer music audio embeddings across cultures. Specifically, the researchers utilized six music datasets from different cultures (including Western pop music, traditional music from the Eastern Mediterranean, and Indian art music) and employed three different deep audio embedding models (two models based on Convolutional Neural Networks (CNN) and one model based on the Transformer architecture) to perform auto-tagging tasks. Through transfer learning, the researchers aimed to explore the performance of these models in different cultural contexts and identify which datasets, when used as the source domain, best support the tasks in the target domain. The experimental results show that transfer learning can achieve good performance both within the same culture and across different cultures, but the optimal source dataset varies depending on the target culture. Additionally, the study preliminarily reveals the similarities between different music cultures.

From West to East: Who can understand the music of the others better?

One Deep Music Representation to Rule Them All? : A comparative analysis of different representation learning strategies

A Dataset for Learning Stylistic and Cultural Correlations Between Music and Videos

Towards Cross-Cultural Analysis using Music Information Dynamics

Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

Modeling of the Latent Embedding of Music using Deep Neural Network

Audio Embeddings as Teachers for Music Classification

Learning to Embed Music and Metadata for Context-Aware Music Recommendation

Exploring Musical Roots: Applying Audio Embeddings to Empower Influence Attribution for a Generative Music Model

A Study of Transfer Learning in Music Source Separation

Music Style Transfer: A Position Paper

N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

Musical Composition Style Transfer via Disentangled Timbre Representations

Crossing You in Style: Cross-modal Style Transfer from Music to Visual Arts

Music Sentiment Transfer

Model-Based Deep Learning for Music Information Research

Embedding Calibration for Music Semantic Similarity using Auto-regressive Transformer

A Universal Music Translation Network

Representations of Sound in Deep Learning of Audio Features from Music

A Tutorial on Deep Learning for Music Information Retrieval