Large Language Models: From Notes to Musical Form

Lilac Atassi
2024-04-18
Abstract:While many topics of the learning-based approach to automated music generation are under active research, musical form is under-researched. In particular, recent methods based on deep learning models generate music that, at the largest time scale, lacks any structure. In practice, music longer than one minute generated by such models is either unpleasantly repetitive or directionless. Adapting a recent music generation model, this paper proposes a novel method to generate music with form. The experimental results show that the proposed method can generate 2.5-minute-long music that is considered as pleasant as the music used to train the model. The paper first reviews a recent music generation method based on language models (transformer architecture). We discuss why learning musical form by such models is infeasible. Then we discuss our proposed method and the experiments.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of structure in music generation. Specifically, existing methods based on deep - learning models lack any structure on the largest time scale when generating music. This means that music generated using these models for more than one minute is often either monotonously repetitive or has no sense of direction. Therefore, the paper proposes a new method, aiming to generate music with structure, especially being able to generate music that is considered as melodious as the music used to train the model on a 2.5 - minute time scale. Through this method, the paper hopes to overcome the limitation of existing models in maintaining the musical form when generating long - time music.