Abstract:We present SongComposer, an innovative LLM designed for song composition. It could understand and generate melodies and lyrics in symbolic song representations, by leveraging the capability of LLM. Existing music-related LLM treated the music as quantized audio signals, while such implicit encoding leads to inefficient encoding and poor flexibility. In contrast, we resort to symbolic song representation, the mature and efficient way humans designed for music, and enable LLM to explicitly compose songs like humans. In practice, we design a novel tuple design to format lyric and three note attributes (pitch, duration, and rest duration) in the melody, which guarantees the correct LLM understanding of musical symbols and realizes precise alignment between lyrics and melody. To impart basic music understanding to LLM, we carefully collected SongCompose-PT, a large-scale song pretraining dataset that includes lyrics, melodies, and paired lyrics-melodies in either Chinese or English. After adequate pre-training, 10K carefully crafted QA pairs are used to empower the LLM with the instruction-following capability and solve diverse tasks. With extensive experiments, SongComposer demonstrates superior performance in lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation, outperforming advanced LLMs like GPT-4.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use large - language models (LLMs) for the creation of lyrics and melodies, especially how to generate melodies that are both in line with the lyrics and harmonious through the symbolic song representation method. Specifically, the paper proposes an innovative large - language model named SongComposer, which aims to understand and generate symbolic song representations, including melodies and lyrics. Different from the existing music - related large - scale language models, these models usually regard music as a quantized audio signal, resulting in low encoding efficiency and poor flexibility. SongComposer, on the other hand, adopts the mature and efficient symbolic song representation method designed by humans, enabling large - language models to create songs explicitly like humans. ### Main Challenges and Solutions: 1. **How to make large - language models learn music**: - Music symbols are very abstract and different from the common use of letters. For example, "F4" in the melody refers to "the F note in the fourth octave", not the paper size or a meaningless combination of letters. - To this end, the author decomposes the melody into triples of three attributes: pitch, duration, and rest time, and uses them as tokens in a new vocabulary. This method retains the actual meaning of music symbols, rather than reusing tokens in the original vocabulary or adopting an abstract representation with an independent audio encoder. - For paired lyrics and melody data, the author aligns the lyrics and melody at the word level to form a sequence containing lyric words and their corresponding musical attributes. 2. **What music knowledge do large - language models need to learn?**: - Large - language models need to master basic music understanding from large - scale song data, including melodies, lyrics, and their precise alignment. - To this end, the author compiles a comprehensive pre - training dataset SongCompose - PT, which contains 280,000 pure lyrics, 20,000 pure melodies, and 15,000 paired lyrics and melodies, covering Chinese and English. 3. **How to make large - language models create songs according to instructions?**: - After sufficient pre - training, large - language models can understand and continue to create songs, but still have difficulties in following flexible instructions. - To overcome this problem, the author designs 10,000 question - and - answer - style dialogues, enabling large - language models to solve a wide range of tasks in the song - generation field. ### Main Contributions: - **Proposing SongComposer**: A large - language model designed specifically for song creation, which can generate melodies and lyrics using symbolic song representations, with better token efficiency, precise representation, flexible format, and human - readable output. - **Compiling the SongCompose - PT dataset**: A comprehensive pre - training dataset containing lyrics, melodies, and paired lyrics - melodies, covering Chinese and English. - **Experimental Results**: Extensive experiments show that SongComposer outperforms advanced large - language models such as GPT - 4 in tasks such as lyric - to - melody generation, melody - to - lyric generation, song continuation, and text - to - song creation. ### Summary: The paper successfully solves the application problems of large - language models in music creation by introducing the symbolic song representation method and carefully designed datasets, especially making significant progress in the co - generation of lyrics and melodies.

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

SongCreator: Lyrics-based Universal Song Generation

ChatMusician: Understanding and Generating Music Intrinsically with LLM

MuPT: A Generative Symbolic Music Pretrained Transformer

Unsupervised Melody-to-Lyric Generation

Neural Melody Composition from Lyrics

Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset

Unsupervised Melody-Guided Lyrics Generation

Accompanied Singing Voice Synthesis with Fully Text-controlled Melody

Agent-Driven Large Language Models for Mandarin Lyric Generation

Video-driven musical composition using large language model with memory-augmented state space

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

LOAF-M2L: Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation

Conditional LSTM-GAN for Melody Generation from Lyrics

Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

Automatic Neural Lyrics and Melody Composition

Symphony Generation with Permutation Invariant Language Model

MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

Modeling the Rhythm from Lyrics for Melody Generation of Pop Song