Abstract:Recently many deep learning-based automatic music generation models have been proposed. How to generate long pieces of pop music with distinctive musical characteristics remains a challenging problem, as it relies heavily on musical structures. Some transformer-based models take advantage of self-attention for generating long-sequence music; however, most pay little attention to well-organized musical structures. In this article, we propose a novel note-to-bar hierarchical model named the Bar Transformer to address long-term dependency issues and generate impressive and structurally meaningful music. In particular, we propose a novel note-to-bar approach that pre-processes the notes within each individual bar to provide a strong structural constraint to increase our model’s awareness of the note-to-bar structure in music. The Bar Transformer is constructed using an encoder-decoder framework, including a two-layer encoder and an arrangement decoder. In the two-layer encoder, the bottom is a note-level encoder, which outputs embeddings by learning the relation between notes within an individual bar, and the top is a bar-level encoder, which uses these embeddings to encode each bar from the melody and chord. The decoder is an arrangement decoder used to generalize the interrelationships among the bars and simultaneously generate melodies and chords. The experimental results of the structural analysis and the aural evaluations demonstrate that our approach outperforms the Music Transformer model and other regressive models used for music generation.

Bar Transformer: a Hierarchical Model for Learning Long-Term Structure and Generating Impressive Pop Music

Transformer-Based Seq2Seq Model for Chord Progression Generation

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

Do we need more complex representations for structure? A comparison of note duration representation for Music Transformers

Hyperbolic Music Transformer for Structured Music Generation

The Power of Fragmentation: A Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation

Structure-Enhanced Pop Music Generation via Harmony-Aware Learning

Structure-informed Positional Encoding for Music Generation

Small Tunes Transformer: Exploring Macro & Micro-Level Hierarchies for Skeleton-Conditioned Melody Generation

Structured Music Transformer: Structured Conditional Music Generation Based on Stylistic Clustering Using Transformer

MELONS: generating melody with long-term structure using transformers and structure graph

Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment

A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation

Automatic Composition of Guitar Tabs by Transformers and Groove Modeling

Melody Structure Transfer Network: Generating Music with Separable Self-Attention

PopMNet: Generating Structured Pop Music Melodies Using Neural Networks

Transformer-XL Based Music Generation with Multiple Sequences of Time-valued Notes

A Transformer-Based Model for Multi-Track Music Generation

Choir Transformer: Generating Polyphonic Music with Relative Attention on Transformer

Symbolic Music Generation with Transformer-GANs

Multi-Genre Music Transformer -- Composing Full Length Musical Piece