Image Reconstruction from Electroencephalography Using Latent Diffusion

Teng Fei,Virginia de Sa
2024-04-02
Abstract:In this work, we have adopted the diffusion-based image reconstruction pipeline previously used for fMRI image reconstruction and applied it to Electroencephalography (EEG). The EEG encoding method is very simple, and forms a baseline from which more sophisticated EEG encoding methods can be compared. We have also evaluated the fidelity of the generated image using the same metrics used in the previous functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) works. Our results show that while the reconstruction from EEG recorded to rapidly presented images is not as good as reconstructions from fMRI to slower presented images, it holds a surprising amount of information that could be applied in specific use cases. Also, EEG-based image reconstruction works better in some categories-such as land animals and food-than others, shedding new light on previous findings of EEG's sensitivity to those categories and revealing potential for these methods to further understand EEG responses to human visual coding. More investigation should use longer-duration image stimulations to elucidate the later components that might be salient to the different image categories.
Neurons and Cognition,Human-Computer Interaction
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to utilize image reconstruction techniques based on diffusion models to reconstruct visual images from Electroencephalography (EEG) data. Specifically, the researchers attempt to apply the diffusion model methods previously used for functional Magnetic Resonance Imaging (fMRI) image reconstruction to EEG data, to evaluate its image reconstruction capability under Rapid Serial Visual Presentation (RSVP) conditions. ### Main Research Objectives 1. **Evaluate the feasibility of EEG image reconstruction**: The researchers aim to experimentally verify whether it is possible to reconstruct visual images similar to the original ones from EEG signals. 2. **Compare reconstruction effects of different image categories**: Analyze the reconstruction quality of different categories of images (such as land animals, food, etc.) and explore the sensitivity of EEG to these categories. 3. **Explore the impact of time windows**: Study the effect of EEG signals from different time windows (such as 200ms, 400ms, etc.) on image reconstruction quality. 4. **Evaluate the contribution of different embedding spaces**: Through model ablation experiments, evaluate the relative contribution of AutoKL, CLIP-Vision, and CLIP-Text embedding spaces to image reconstruction performance. ### Research Background - **Importance of visual perception**: Visual perception is a crucial aspect of human cognition, essential for understanding more complex cognitive processes such as visual imagination and dream vision. - **Development of functional neuroimaging**: With the development of functional neuroimaging, researchers can use these technologies to decode visual information, such as using Receptive Field Models to decode images from fMRI data. - **Application of diffusion models**: The introduction of diffusion models has made image reconstruction more vivid and interpretable, capable of generating images with semantic content, allowing the use of deep neural network models to compare the high-level semantic similarity between generated images and original images. - **Advantages and limitations of EEG**: Compared to fMRI, EEG has higher temporal resolution and lower cost, but its spatial resolution is lower and is affected by volume conduction, limiting its application in image reconstruction. ### Research Methods 1. **Dataset**: Use the THINGS-EEG2 dataset, which contains data from 17 posterior EEG channels. 2. **Image reconstruction process**: Divided into two stages: - First stage: Map the EEG signals to the latent space of a Variational Autoencoder (VAE) to generate a rough visual representation. - Second stage: Map the same EEG signals to the CLIP-Vision and CLIP-Text embedding spaces, combine the images generated by the VAE, and use the Versatile Diffusion model to generate the final reconstructed images. 3. **Performance evaluation metrics**: Use multiple performance metrics (such as pixel-level correlation, Structural Similarity Index SSIM, AlexNet, Inception, CLIP, etc.) to evaluate the quality of the reconstructed images. 4. **Model ablation experiments**: Evaluate the impact of removing or replacing certain embedding spaces on image reconstruction performance. 5. **Time window swapping experiments**: Study the impact of specific time periods on image reconstruction by swapping EEG data from different time periods. ### Results 1. **Basic performance metrics**: Results show that using 400ms data yields slightly better reconstruction effects than using 200ms data. 2. **Model ablation experiments**: The complete model performs best on all performance metrics, with the model without CLIP-Text performing slightly better than the model without CLIP-Vision. 3. **Time window swapping experiments**: Image reconstruction is significantly affected in the 100-380ms time period, indicating that EEG signals in this time period are crucial for image reconstruction. ### Discussion - **Choice of diffusion models**: To fairly compare results from different studies, it is recommended to adapt to the latest diffusion models. - **Spatial resolution limitations of EEG**: The low spatial resolution of EEG limits the fidelity of image reconstruction. - **Future research directions**: Future research can extend...