Abstract:Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationship between melody and lyrics. In this work, we propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. Specifically, we design a hierarchical lyric generation framework that first generates a song outline and second the complete lyrics. The framework enables disentanglement of training (based purely on text) from inference (melody-guided text generation) to circumvent the shortage of parallel data. We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints as guidance during inference. The two-step hierarchical design also enables content control via the lyric outline, a much-desired feature for democratizing collaborative song creation. Experimental results show that our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines, for example SongMASS, a SOTA model trained on a parallel dataset, with a 24% relative overall quality improvement based on human ratings.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of **automatic melody - to - lyric generation**, that is, automatically generating matching lyrics according to a given melody. Specifically, the paper mainly focuses on the following challenges: 1. **Data scarcity**: Most songs are protected by copyright, resulting in very limited data of melody - and - lyric alignment. This makes it difficult for the model to fully learn the complex cross - modal relationship between melody and lyrics, thus causing the model to be prone to under - fitting during training. 2. **Complexity of melody and lyric modeling**: Compared with unimodal sequence - to - sequence tasks (such as machine translation), the potential association between melody and lyrics is more difficult to capture. Although existing methods attempt to use neural network models such as RNN, LSTM, SeqGAN or Transformer to capture this mapping relationship, the effect is not ideal. For example, these methods may generate lyrics that are incoherent or do not conform to the singing rhythm. 3. **Controllability of lyric generation**: Existing methods are difficult to achieve control over the content of generated lyrics, such as specifying topics or keywords. This is an important functional requirement for the democratization of collaborative music creation. To solve these problems, the paper proposes an unsupervised, hierarchical melody - conditional lyric generator named **LYRA**. The main features of this model are as follows: - **Training without alignment data**: By using large - scale pre - trained language models (PTLMs), LYRA can generate high - quality lyrics without relying on melody - lyric alignment data. - **Hierarchical generation framework**: LYRA adopts a hierarchical generation framework, first generating a lyric outline and then generating complete lyrics. This design not only improves the coherence and controllability of the generated content, but also better adapts to user - specified topics or keywords. - **Melody - guided reasoning**: In the reasoning stage, LYRA compiles the given melody into decoding constraints (such as segmentation and rhythm alignment) to guide the lyric generation process, ensuring that the generated lyrics conform to the rhythm and structure of the melody. Experimental results show that the lyrics generated by LYRA are superior to existing strong baseline models, such as SongMASS, in terms of topic relevance, singability, fluency and coherence, and also achieve a significant overall quality improvement in human evaluation. ### Summary By proposing the LYRA model, this paper solves the problems of data scarcity and complex modeling in the melody - to - lyric generation task, and at the same time realizes effective control over the generated content.

Unsupervised Melody-to-Lyric Generation

Unsupervised Melody-Guided Lyrics Generation

Lyrics-Conditioned Neural Melody Generation

Automatic Neural Lyrics and Melody Composition

Neural Melody Composition from Lyrics

Melody Generation from Lyrics with Local Interpretability

SongGLM: Lyric-to-Melody Generation with 2D Alignment Encoding and Multi-Task Pre-Training

TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method

Conditional LSTM-GAN for Melody Generation from Lyrics

Modeling the Rhythm from Lyrics for Melody Generation of Pop Song

REFFLY: Melody-Constrained Lyrics Editing Model

Syllable-level lyrics generation from melody exploiting character-level language model

Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN

LOAF-M2L: Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation

Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training

Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics

A Syllable-Structured, Contextually-Based Conditionally Generation of Chinese Lyrics

ReLyMe: Improving Lyric-to-Melody Generation by Incorporating Lyric-Melody Relationships

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls