Abstract:In recent years, artificial neural networks (ANNs) have become a universal tool for tackling real-world problems. ANNs have also shown great success in music-related tasks including music summarization and classification, similarity estimation, computer-aided or autonomous composition, and automatic music analysis. As structure is a fundamental characteristic of Western music, it plays a role in all these tasks. Some structural aspects are particularly challenging to learn with current ANN architectures. This is especially true for mid- and high-level self-similarity, tonal and rhythmic relationships. In this thesis, I explore the application of ANNs to different aspects of musical structure modeling, identify some challenges involved and propose strategies to address them. First, using probability estimations of a Restricted Boltzmann Machine (RBM), a probabilistic bottom-up approach to melody segmentation is studied. Then, a top-down method for imposing a high-level structural template in music generation is presented, which combines Gibbs sampling using a convolutional RBM with gradient-descent optimization on the intermediate solutions. Furthermore, I motivate the relevance of musical transformations in structure modeling and show how a connectionist model, the Gated Autoencoder (GAE), can be employed to learn transformations between musical fragments. For learning transformations in sequences, I propose a special predictive training of the GAE, which yields a representation of polyphonic music as a sequence of intervals. Furthermore, the applicability of these interval representations to a top-down discovery of repeated musical sections is shown. Finally, a recurrent variant of the GAE is proposed, and its efficacy in music prediction and modeling of low-level repetition structure is demonstrated.

Large Language Models: From Notes to Musical Form

Musical Form Generation

Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces

Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Long-form music generation with latent diffusion

Conditioning Deep Generative Raw Audio Models for Structured Automatic Music

Video-driven musical composition using large language model with memory-augmented state space

Deep learning for music generation: challenges and directions

Modeling Musical Structure with Artificial Neural Networks

Melody generation based on deep ensemble learning using varying temporal context length

2019 Formatting Instructions for Authors Using LaTeX

MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks

The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?

Deep Learning-Based Music Generation

Retrieval Augmented Generation of Symbolic Music with LLMs

LSTM Based Music Generation System

Simple and Controllable Music Generation

MuPT: A Generative Symbolic Music Pretrained Transformer

Music Generation System for Adversarial Training Based on Deep Learning

Using a Bi-directional LSTM Model with Attention Mechanism trained on MIDI Data for Generating Unique Music