Brain2Music: Reconstructing Music from Human Brain Activity

Timo I. Denk,Yu Takagi,Takuya Matsuyama,Andrea Agostinelli,Tomoya Nakai,Christian Frank,Shinji Nishimoto
2023-07-21
Abstract:The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derived from fMRI data. The generated music resembles the musical stimuli that human subjects experienced, with respect to semantic properties like genre, instrumentation, and mood. We investigate the relationship between different components of MusicLM and brain activity through a voxel-wise encoding modeling analysis. Furthermore, we discuss which brain regions represent information derived from purely textual descriptions of music stimuli. We provide supplementary material including examples of the reconstructed music at <a class="link-external link-https" href="https://google-research.github.io/seanet/brain2music" rel="external noopener nofollow">this https URL</a>
Neurons and Cognition,Machine Learning,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: how to reconstruct music from human brain activity (captured by functional magnetic resonance imaging, fMRI). Specifically, the researchers explored methods of conditioning the MusicLM music generation model using music retrieval or music embeddings generated based on fMRI data to generate music that is similar in semantic attributes (such as genre, instrument, mood) to the original music stimuli. In addition, they also studied the relationship between different components of MusicLM and brain activity, especially through voxel - level encoding modeling analysis, and discussed which brain regions represent information only from the text description of music stimuli. ### Main contributions: 1. **Music Reconstruction**: By predicting high - dimensional, semantically structured music embeddings and using deep neural networks to generate music from these features, music reconstruction from fMRI scans was achieved. Evaluations show that the reconstructed music is semantically similar to the original music stimuli. 2. **Prediction of Activity in the Brain's Auditory Cortex**: It was found that different components of the music generation model can predict the activity of the human auditory cortex. Compared with the distinction between low - level and high - level representations of visual stimuli in the visual cortex, this distinction in the auditory cortex is less obvious. 3. **Overlapping Prediction in the Auditory Cortex**: It provides new insights, indicating that there is a significant overlap of voxels predicted from music described by pure text and the music itself in the auditory cortex. ### Method Overview: - **Dataset**: The neuroimaging dataset of music genres by Nakai et al. (2022) was used, which contains 540 music segments of 10 genres. - **Model**: The MuLan joint text/music embedding model and the MusicLM conditional music generation model were utilized. The MuLan model maps music and text to a 128 - dimensional embedding space, and MusicLM generates music based on these embeddings. - **Decoding Process**: The fMRI response was mapped to MuLan embeddings through linear regression, and then MusicLM was used to generate music. At the same time, methods of retrieving similar music from existing music libraries were also explored. - **Evaluation Metrics**: Recognition accuracy and the top - n consistency rate of AudioSet categories were used to evaluate the quality of the reconstructed music. ### Results: - **Music Embedding Prediction**: MuLan music embeddings can be more accurately predicted from fMRI signals than other types of embeddings (such as MuLan text embeddings, w2v - BERT average embeddings, SoundStream average embeddings). - **Qualitative Reconstruction Results**: The music retrieved by FMA and generated by MusicLM is semantically similar to the original stimuli, but the temporal structure often cannot be fully preserved. - **Quantitative Reconstruction Evaluation**: Significantly higher - than - random performance was observed on all metrics, supporting the feasibility of reconstructing music from fMRI data. In conclusion, this paper demonstrates the initial success of reconstructing music from human brain activity, providing a new perspective for understanding how the brain processes and represents music.