Abstract:Most existing neural network models for music generation explore how to generate music bars, then directly splice the music bars into a song. However, these methods do not explore the relationship between the bars, and the connected song as a whole has no musical form structure and sense of musical direction. To address this issue, we propose a Multi-model Multi-task Hierarchical Conditional VAE-GAN (Variational Autoencoder-Generative adversarial networks) networks, named MIDI-Sandwich, which combines musical knowledge, such as musical form, tonic, and melodic motion. The MIDI-Sandwich has two submodels: Hierarchical Conditional Variational Autoencoder (HCVAE) and Hierarchical Conditional Generative Adversarial Network (HCGAN). The HCVAE uses hierarchical structure. The underlying layer of HCVAE uses Local Conditional Variational Autoencoder (L-CVAE) to generate a music bar which is pre-specified by the First and Last Notes (FLN). The upper layer of HCVAE uses Global Variational Autoencoder(G-VAE) to analyze the latent vector sequence generated by the L-CVAE encoder, to explore the musical relationship between the bars, and to produce the song pieced together by multiple music bars generated by the L-CVAE decoder, which makes the song both have musical structure and sense of direction. At the same time, the HCVAE shares a part of itself with the HCGAN to further improve the performance of the generated music. The MIDI-Sandwich is validated on the Nottingham dataset and is able to generate a single-track melody sequence (17x8 beats), which is superior to the length of most of the generated models (8 to 32 beats). Meanwhile, by referring to the experimental methods of many classical kinds of literature, the quality evaluation of the generated music is performed. The above experiments prove the validity of the model.

SteelyGAN: Semantic Unsupervised Symbolic Music Genre Transfer.

Style Fader Generative Adversarial Networks for Style Degree Controllable Artistic Style Transfer

Music Style Transfer: A Position Paper

Music Sentiment Transfer

Play as You Like: Timbre-enhanced Multi-modal Music Style Transfer

Supervised Symbolic Music Style Translation Using Synthetic Data

Preserving Structural Consistency in Arbitrary Artist and Artwork Style Transfer

Crossing You in Style: Cross-modal Style Transfer from Music to Visual Arts

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

Message-Driven Generative Music Steganography Using MIDI-GAN

N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

Learning Style-Aware Symbolic Music Representations by Adversarial Autoencoders

MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation

Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings

Musical Composition Style Transfer via Disentangled Timbre Representations

An automatic music generation and evaluation method based on transfer learning

Music Style Transfer with Time-Varying Inversion of Diffusion Models

Quantifying Musical Style: Ranking Symbolic Music based on Similarity to a Style

A Training-Free Approach for Music Style Transfer with Latent Diffusion Models

Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset

Music Generation System for Adversarial Training Based on Deep Learning