Abstract:Controllable music generation promotes the interaction between humans and composition systems by projecting the users' intent on their desired music. The challenge of introducing controllability is an increasingly important issue in the symbolic music generation field. When building controllable generative popular multi-instrument music systems, two main challenges typically present themselves, namely weak controllability and poor music quality. To address these issues, we first propose spatiotemporal features as powerful and fine-grained controls to enhance the controllability of the generative model. In addition, an efficient music representation called REMI_Track is designed to convert multitrack music into multiple parallel music sequences and shorten the sequence length of each track with Byte Pair Encoding (BPE) techniques. Subsequently, we release BandControlNet, a conditional model based on parallel Transformers, to tackle the multiple music sequences and generate high-quality music samples that are conditioned to the given spatiotemporal control features. More concretely, the two specially designed modules of BandControlNet, namely structure-enhanced self-attention (SE-SA) and Cross-Track Transformer (CTT), are utilized to strengthen the resulting musical structure and inter-track harmony modeling respectively. Experimental results tested on two popular music datasets of different lengths demonstrate that the proposed BandControlNet outperforms other conditional music generation models on most objective metrics in terms of fidelity and inference speed and shows great robustness in generating long music samples. The subjective evaluations show BandControlNet trained on short datasets can generate music with comparable quality to state-of-the-art models, while outperforming them significantly using longer datasets.

Fine-Tuning Music Generation with Reinforcement Learning Based on Transformer

An automatic music generation method based on RSCLN_Transformer network

Music Generation based on Generative Adversarial Networks with Transformer

Music Generation System for Adversarial Training Based on Deep Learning

Deep Learning-Based Music Generation

Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls

RL-Chord: CLSTM-Based Melody Harmonization Using Deep Reinforcement Learning

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

MusicRL: Aligning Music Generation to Human Preferences

Generating music with sentiment using Transformer-GANs

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

Conditioning Deep Generative Raw Audio Models for Structured Automatic Music

Symbolic Music Generation with Transformer-GANs

The Usage of Artificial Intelligence Technology in Music Education System Under Deep Learning

BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features

The Analysis of Multi-Track Music Generation With Deep Learning Models in Music Production Process

Automatic composition of Guzheng (Chinese Zither) music using long short-term memory network (LSTM) and reinforcement learning (RL)

Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach

Transformer-XL Based Music Generation with Multiple Sequences of Time-valued Notes

Large Language Models: From Notes to Musical Form

Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs