DreamCatcher: Revealing the Language of the Brain with fMRI using GPT Embedding

Subhrasankar Chatterjee,Debasis Samanta
2023-06-16
Abstract:The human brain possesses remarkable abilities in visual processing, including image recognition and scene summarization. Efforts have been made to understand the cognitive capacities of the visual brain, but a comprehensive understanding of the underlying mechanisms still needs to be discovered. Advancements in brain decoding techniques have led to sophisticated approaches like fMRI-to-Image reconstruction, which has implications for cognitive neuroscience and medical imaging. However, challenges persist in fMRI-to-image reconstruction, such as incorporating global context and contextual information. In this article, we propose fMRI captioning, where captions are generated based on fMRI data to gain insight into the neural correlates of visual perception. This research presents DreamCatcher, a novel framework for fMRI captioning. DreamCatcher consists of the Representation Space Encoder (RSE) and the RevEmbedding Decoder, which transform fMRI vectors into a latent space and generate captions, respectively. We evaluated the framework through visualization, dataset training, and testing on subjects, demonstrating strong performance. fMRI-based captioning has diverse applications, including understanding neural mechanisms, Human-Computer Interaction, and enhancing learning and training processes.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the limitations of current fMRI-to-Image reconstruction techniques in understanding the neural mechanisms of visual perception. Specifically, existing fMRI-to-Image reconstruction methods face the following challenges: 1. **Insufficient Global Context Understanding**: Most frameworks can only capture local features when reconstructing images, failing to understand the global contextual information of the image. 2. **Balance Between Low-Level and High-Level Features**: While they can effectively capture low-level object features, they perform poorly in reconstructing high-level features such as the overall layout of a scene. To overcome these challenges, the paper proposes a new research direction called fMRI captioning. By generating captions based on fMRI data, researchers hope to gain a deeper understanding of the neural basis of visual perception. Specifically, the paper introduces the DreamCatcher framework, which includes two main components: - **Representation Space Encoder (RSE)**: Converts preprocessed fMRI vectors into a 1536-dimensional GPT embedding space. - **RevEmbedding Decoder**: Converts vectors in the GPT embedding space into natural language captions. Through this framework, researchers hope to capture not only low-level object features but also incorporate high-level contextual information of the image, achieving a more comprehensive and coherent reconstruction of visual stimuli. Additionally, fMRI captioning technology has broad application prospects in understanding neural mechanisms, human-computer interaction, education, and other fields.