Language Models for Music Medicine Generation

Emmanouil Nikolakakis,Joann Ching,Emmanouil Karystinaios,Gabrielle Sipin,Gerhard Widmer,Razvan Marinescu
2024-11-14
Abstract:Music therapy has been shown in recent years to provide multiple health benefits related to emotional wellness. In turn, maintaining a healthy emotional state has proven to be effective for patients undergoing treatment, such as Parkinson's patients or patients suffering from stress and anxiety. We propose fine-tuning MusicGen, a music-generating transformer model, to create short musical clips that assist patients in transitioning from negative to desired emotional states. Using low-rank decomposition fine-tuning on the MTG-Jamendo Dataset with emotion tags, we generate 30-second clips that adhere to the iso principle, guiding patients through intermediate states in the valence-arousal circumplex. The generated music is evaluated using a music emotion recognition model to ensure alignment with intended emotions. By concatenating these clips, we produce a 15-minute "music medicine" resembling a music therapy session. Our approach is the first model to leverage Language Models to generate music medicine. Ultimately, the output is intended to be used as a temporary relief between music therapy sessions with a board-certified therapist.
Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use language models to generate music segments that can help patients transition from a negative emotional state to a desired positive emotional state, so as to assist in music therapy. Specifically, the author proposes a method to create short music segments that conform to the "iso - principle" by fine - tuning the Music Generation Transformer model (MusicGen). These segments can guide patients through intermediate emotional states, thus achieving a smooth emotional transition. The generated music is intended to be a temporary relief measure between formal treatments by music therapists, helping to maintain the healthy emotional state of patients, and is especially effective in patients with Parkinson's disease or those under stress and anxiety. The main innovation in the paper lies in using a language model to generate the so - called "music medicine" for the first time. This is a 15 - minute audio created through continuous - conditional music generation technology, aiming to simulate the music therapy process. This method not only considers the emotional labels of music, but also improves the model's sensitivity to specific emotional labels through low - rank decomposition fine - tuning technology, ensuring that the generated music can accurately reflect the expected emotional changes. In addition, the paper also introduces the method of continuous prompt engineering and how to change instruments and music styles by adjusting the temperature variable to better adapt to the changing emotional states of patients.