Abstract:Most existing neural network models for music generation explore how to generate music bars, then directly splice the music bars into a song. However, these methods do not explore the relationship between the bars, and the connected song as a whole has no musical form structure and sense of musical direction. To address this issue, we propose a Multi-model Multi-task Hierarchical Conditional VAE-GAN (Variational Autoencoder-Generative adversarial networks) networks, named MIDI-Sandwich, which combines musical knowledge, such as musical form, tonic, and melodic motion. The MIDI-Sandwich has two submodels: Hierarchical Conditional Variational Autoencoder (HCVAE) and Hierarchical Conditional Generative Adversarial Network (HCGAN). The HCVAE uses hierarchical structure. The underlying layer of HCVAE uses Local Conditional Variational Autoencoder (L-CVAE) to generate a music bar which is pre-specified by the First and Last Notes (FLN). The upper layer of HCVAE uses Global Variational Autoencoder(G-VAE) to analyze the latent vector sequence generated by the L-CVAE encoder, to explore the musical relationship between the bars, and to produce the song pieced together by multiple music bars generated by the L-CVAE decoder, which makes the song both have musical structure and sense of direction. At the same time, the HCVAE shares a part of itself with the HCGAN to further improve the performance of the generated music. The MIDI-Sandwich is validated on the Nottingham dataset and is able to generate a single-track melody sequence (17x8 beats), which is superior to the length of most of the generated models (8 to 32 beats). Meanwhile, by referring to the experimental methods of many classical kinds of literature, the quality evaluation of the generated music is performed. The above experiments prove the validity of the model.

MGU-V: A Deep Learning Approach for Lo-Fi Music Generation Using Variational Autoencoders With State-of-the-Art Performance on Combined MIDI Datasets

Exploring how a Generative AI interprets music

Exploring Variational Auto-Encoder Architectures, Configurations, and Datasets for Generative Music Explainable AI

MG-VAE: Deep Chinese Folk Songs Generation with Specific Regional Style

A Lightweight Deep Learning-Based Approach for Jazz Music Generation in MIDI Format

Conditioning Deep Generative Raw Audio Models for Structured Automatic Music

GGA-MG: Generative Genetic Algorithm for Music Generation

A Systematic Survey of Approaches Used in Computer Music Generation

Flat Latent Manifolds for Human-machine Co-creation of Music

M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models

Deep Learning-Based Music Generation

MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation

Towards democratizing music production with AI-Design of Variational Autoencoder-based Rhythm Generator as a DAW plugin

Efficient Neural Music Generation

A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

Melody generation based on deep ensemble learning using varying temporal context length

Melody Generation using Deep Learning: Unleashing the Power of RNN and LSTM

Using a Bi-directional LSTM Model with Attention Mechanism trained on MIDI Data for Generating Unique Music

The Usage of Artificial Intelligence Technology in Music Education System Under Deep Learning

LSTM Based Music Generation System

Transfer Learning for Underrepresented Music Generation