Abstract:In recent years, artificial neural networks (ANNs) have become a universal tool for tackling real-world problems. ANNs have also shown great success in music-related tasks including music summarization and classification, similarity estimation, computer-aided or autonomous composition, and automatic music analysis. As structure is a fundamental characteristic of Western music, it plays a role in all these tasks. Some structural aspects are particularly challenging to learn with current ANN architectures. This is especially true for mid- and high-level self-similarity, tonal and rhythmic relationships. In this thesis, I explore the application of ANNs to different aspects of musical structure modeling, identify some challenges involved and propose strategies to address them. First, using probability estimations of a Restricted Boltzmann Machine (RBM), a probabilistic bottom-up approach to melody segmentation is studied. Then, a top-down method for imposing a high-level structural template in music generation is presented, which combines Gibbs sampling using a convolutional RBM with gradient-descent optimization on the intermediate solutions. Furthermore, I motivate the relevance of musical transformations in structure modeling and show how a connectionist model, the Gated Autoencoder (GAE), can be employed to learn transformations between musical fragments. For learning transformations in sequences, I propose a special predictive training of the GAE, which yields a representation of polyphonic music as a sequence of intervals. Furthermore, the applicability of these interval representations to a top-down discovery of repeated musical sections is shown. Finally, a recurrent variant of the GAE is proposed, and its efficacy in music prediction and modeling of low-level repetition structure is demonstrated.

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models

Injecting structural hints: Using language models to study inductive biases in language learning

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Algebraic structures emerge from the self-supervised learning of natural sounds

What makes a language easy to deep-learn? Deep neural networks and humans similarly benefit from compositional structure

Does injecting linguistic structure into language models lead to better alignment with brain recordings?

Are Human Learners Capable of Learning Arbitrary Language Structures

Modeling Musical Structure with Artificial Neural Networks

Syntactic Structure from Deep Learning

What Artificial Neural Networks Can Tell Us About Human Language Acquisition

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

Opening the black box of language acquisition

The Shared Neural Basis of Music and Language

Natural language instructions induce compositional generalization in networks of neurons

LLark: A Multimodal Instruction-Following Language Model for Music

Constructive Interrelationship Between Structural Components in Early Music and Language Learning

Examining the Inductive Bias of Neural Language Models with Artificial Languages

How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases

Neurophysiological Markers of Statistical Learning in Music and Language: Hierarchy, Entropy, and Uncertainty

The Grammar-Learning Trajectories of Neural Language Models