Abstract:Existing music generation models are mostly language-based, neglecting the frequency continuity property of notes, resulting in inadequate fitting of rare or never-used notes and thus reducing the diversity of generated samples. We argue that the distribution of notes can be modeled by translational invariance and periodicity, especially using diffusion models to generalize notes by injecting frequency-domain Gaussian noise. However, due to the low-density nature of music symbols, estimating the distribution of notes latent in the high-density solution space poses significant challenges. To address this problem, we introduce the Music-Diff architecture, which fits a joint distribution of notes and accompanying semantic information to generate symbolic music conditionally. We first enhance the fragmentation module for extracting semantics by using event-based notations and the structural similarity index, thereby preventing boundary blurring. As a prerequisite for multivariate perturbation, we introduce a joint pre-training method to construct the progressions between notes and musical semantics while avoiding direct modeling of low-density notes. Finally, we recover the perturbed notes by a multi-branch denoiser that fits multiple noise objectives via Pareto optimization. Our experiments suggest that in contrast to language models, joint probability diffusion models perturbing at both note and semantic levels can provide more sample diversity and compositional regularity. The case study highlights the rhythmic advantages of our model over language- and DDPMs-based models by analyzing the hierarchical structure expressed in the self-similarity metrics.

Efficient Fine-Grained Guidance for Diffusion-Based Symbolic Music Generation

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Fast Diffusion GAN Model for Symbolic Music Generation Controlled by Emotions

GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

Composer Style-specific Symbolic Music Generation Using Vector Quantized Discrete Diffusion Models

DiffuseRoll: Multi-track multi-category music generation based on diffusion model

Taming Diffusion Models for Music-driven Conducting Motion Generation

Symbolic Music Generation with Diffusion Models

Controllable Music Production with Diffusion Models and Guidance Gradients

Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models

Generating High-quality Symbolic Music Using Fine-grained Discriminators

DiffuseRoll: multi-track multi-attribute music generation based on diffusion model

Why Perturbing Symbolic Music is Necessary: Fitting the Distribution of Never-used Notes through a Joint Probabilistic Diffusion Model

Discrete Diffusion Probabilistic Models for Symbolic Music Generation

PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation

Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance

Generating symbolic music using diffusion models

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

Performance Conditioning for Diffusion-Based Multi-Instrument Music Synthesis

Flexible Control in Symbolic Music Generation via Musical Metadata

Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls