Abstract:Following the success of the transformer architecture in the natural language domain, transformer-like architectures have been widely applied to the domain of symbolic music recently. Symbolic music and text, however, are two different modalities. Symbolic music contains multiple attributes, both absolute attributes (e.g., pitch) and relative attributes (e.g., pitch interval). These relative attributes shape human perception of musical motifs. These important relative attributes, however, are mostly ignored in existing symbolic music modelling methods with the main reason being the lack of a musically-meaningful embedding space where both the absolute and relative embeddings of the symbolic music tokens can be efficiently represented. In this paper, we propose the Fundamental Music Embedding (FME) for symbolic music based on a bias-adjusted sinusoidal encoding within which both the absolute and the relative attributes can be embedded and the fundamental musical properties (e.g., translational invariance) are explicitly preserved. Taking advantage of the proposed FME, we further propose a novel attention mechanism based on the relative index, pitch and onset embeddings (RIPO attention) such that the musical domain knowledge can be fully utilized for symbolic music modelling. Experiment results show that our proposed model: RIPO transformer which utilizes FME and RIPO attention outperforms the state-of-the-art transformers (i.e., music transformer, linear transformer) in a melody completion task. Moreover, using the RIPO transformer in a downstream music generation task, we notice that the notorious degeneration phenomenon no longer exists and the music generated by the RIPO transformer outperforms the music generated by state-of-the-art transformer models in both subjective and objective evaluations. The code of the proposed method is available online: github.com/guozixunnicolas/FundamentalMusicEmbedding.

Motif-Centric Representation Learning for Symbolic Music

Motifs, Phrases, and Beyond: The Modelling of Structure in Symbolic Music Generation

The Beauty of Repetition: an Algorithmic Composition Model with Motif-level Repetition Generator and Outline-to-music Generator in Symbolic Music Generation

Symbolic Music Loop Generation with Neural Discrete Representations

A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling

Score Images as a Modality: Enhancing Symbolic Music Understanding through Large-Scale Multimodal Pre-Training

An Attentional Neural Network Architecture for Folk Song Classification

Learning Node Representation Via Motif Coarsening

Symbolic Music Representations for Classification Tasks: A Systematic Evaluation

N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

Mode-conditioned music learning and composition: a spiking neural network inspired by neuroscience and psychology

Exploring modality-agnostic representations for music classification

RUM: Network Representation Learning Using Motifs

motif2vec: Motif Aware Node Representation Learning for Heterogeneous Networks

MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music

Coordinate Embedding Transformer Model for Optical Music Recognition on Monophonic Scores

MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit

Modelling Symbolic Music: Beyond the Piano Roll

PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music.

The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

Learn A Robust Representation for Cover Song Identification Via Aggregating Local and Global Music Temporal Context.