Decoding visual brain representations from electroencephalography through Knowledge Distillation and latent diffusion models

Matteo Ferrante,Tommaso Boccato,Stefano Bargione,Nicola Toschi
2023-09-08
Abstract:Decoding visual representations from human brain activity has emerged as a thriving research domain, particularly in the context of brain-computer interfaces. Our study presents an innovative method that employs to classify and reconstruct images from the ImageNet dataset using electroencephalography (EEG) data from subjects that had viewed the images themselves (i.e. "brain decoding"). We analyzed EEG recordings from 6 participants, each exposed to 50 images spanning 40 unique semantic categories. These EEG readings were converted into spectrograms, which were then used to train a convolutional neural network (CNN), integrated with a knowledge distillation procedure based on a pre-trained Contrastive Language-Image Pre-Training (CLIP)-based image classification teacher network. This strategy allowed our model to attain a top-5 accuracy of 80%, significantly outperforming a standard CNN and various RNN-based benchmarks. Additionally, we incorporated an image reconstruction mechanism based on pre-trained latent diffusion models, which allowed us to generate an estimate of the images which had elicited EEG activity. Therefore, our architecture not only decodes images from neural activity but also offers a credible image reconstruction from EEG only, paving the way for e.g. swift, individualized feedback experiments. Our research represents a significant step forward in connecting neural signals with visual cognition.
Signal Processing,Artificial Intelligence,Machine Learning,Neurons and Cognition
What problem does this paper attempt to address?
The paper aims to address the problem of decoding visual representations from human brain activity (recorded via electroencephalography, EEG) and proposes an innovative method that utilizes knowledge distillation techniques and pre-trained latent diffusion models to classify and reconstruct images. Specifically, the research objectives include: 1. **Classification and Reconstruction**: Predicting and reconstructing images from the ImageNet dataset that participants have viewed through EEG data. 2. **Improving Accuracy**: Enhancing the classification performance of convolutional neural networks (CNNs) through knowledge distillation techniques, particularly achieving significantly better top-5 accuracy compared to baseline models. 3. **Individualized Models**: Developing models tailored to individual participants to capture unique neural representations, thereby providing more detailed decoding results. 4. **Real-time Applications**: Achieving near real-time brain decoding to lay the foundation for further practical applications, such as brain-computer interfaces and biofeedback systems. In summary, this research is dedicated to decoding complex visual stimuli through EEG signals, advancing the connection between visual cognition and neural signals, and exploring its potential applications in various real-world scenarios.