Efficient Fine-Grained Guidance for Diffusion-Based Symbolic Music Generation

Tingyu Zhu,Haoyu Liu,Ziyu Wang,Zhimin Jiang,Zeyu Zheng
2025-02-03
Abstract:Developing generative models to create or conditionally create symbolic music presents unique challenges due to the combination of limited data availability and the need for high precision in note pitch. To address these challenges, we introduce an efficient Fine-Grained Guidance (FGG) approach within diffusion models. FGG guides the diffusion models to generate music that aligns more closely with the control and intent of expert composers, which is critical to improve the accuracy, listenability, and quality of generated music. This approach empowers diffusion models to excel in advanced applications such as improvisation, and interactive music creation. We derive theoretical characterizations for both the challenges in symbolic music generation and the effects of the FGG approach. We provide numerical experiments and subjective evaluation to demonstrate the effectiveness of our approach. We have published a demo page to showcase performances, as one of the first in the symbolic music literature's demo pages that enables real-time interactive generation.
Sound,Artificial Intelligence,Machine Learning,Multimedia,Audio and Speech Processing
What problem does this paper attempt to address?