Retrieval Augmented Generation of Symbolic Music with LLMs

Nicolas Jonason,Luca Casini,Carl Thomé,Bob L.T. Sturm
2023-12-28
Abstract:We explore the use of large language models (LLMs) for music generation using a retrieval system to select relevant examples. We find promising initial results for music generation in a dialogue with the user, especially considering the ease with which such a system can be implemented. The code is available online.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The paper explores how to utilize large language models (LLMs) in combination with retrieval-augmented generation (RAG) technology to generate symbolic music, particularly folk tunes. Specifically, the research team designed a system capable of retrieving relevant music examples from a database based on user requests and passing these examples along with the user requests as prompts to a Composer LLM, thereby generating new symbolic music. The main contributions of the paper include: 1. **System Design**: Proposing a music generation system based on retrieval-augmented generation, which can understand user requests and assist the generation process by retrieving music examples from a database. 2. **Experimental Demonstrations**: Showcasing three specific experimental cases, including conditional generation, style transfer, and music fragment completion, to demonstrate the system's effectiveness. 3. **Technical Details**: Providing a detailed introduction to the system components, including the Retrieval LLM used for retrieving relevant music examples and the Composer LLM used for actual music generation. The Retrieval LLM selects the most relevant music examples from the database based on a simple tag matching strategy, while the Composer LLM is responsible for generating new music pieces based on the prompts. 4. **Future Work Outlook**: Planning to further develop the application of LLMs in the field of music analysis and generation, and to evaluate the quality of the generated music by the proposed system. In summary, this paper aims to explore an innovative approach to generating high-quality symbolic music and validates its feasibility through experiments.