Abstract:Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli presented to participants compromise their quality and fidelity in capturing precise brain activity. The study in neuroscience unveils that the relationship between visual and textual stimulus in EEG recordings provides valuable insights into the brain's ability to process and integrate multi-modal information simultaneously. Inspired by this, we propose a novel large-scale multi-modal dataset, named EIT-1M, with over 1 million EEG-image-text pairs. Our dataset is superior in its capacity of reflecting brain activities in simultaneously processing multi-modal information. To achieve this, we collected data pairs while participants viewed alternating sequences of visual-textual stimuli from 60K natural images and category-specific texts. Common semantic categories are also included to elicit better reactions from participants' brains. Meanwhile, response-based stimulus timing and repetition across blocks and sessions are included to ensure data diversity. To verify the effectiveness of EIT-1M, we provide an in-depth analysis of EEG data captured from multi-modal stimuli across different categories and participants, along with data quality scores for transparency. We demonstrate its validity on two tasks: 1) EEG recognition from visual or textual stimuli or both and 2) EEG-to-visual generation.

What problem does this paper attempt to address?

The paper attempts to address the shortcomings of existing EEG datasets in multimodal information processing. Specifically, the existing EEG datasets have the following two main issues: 1. **Limited number of EEG samples per category**: This limits the quality and accuracy of the datasets, especially in capturing brain activity. 2. **Only contain single-modal stimuli (visual or textual)**: This makes these datasets difficult to use for training high-performance multimodal AI models. To overcome these issues, the paper proposes a new large-scale multimodal dataset, EIT-1M, which contains over 1 million pairs of EEG-image-text data. By collecting brain activity data from participants while they view alternating visual and textual stimuli, EIT-1M aims to better reflect brain activity when processing multimodal information simultaneously. ### Main Contributions - **Large-scale dataset**: EIT-1M contains over 1 million pairs of EEG-image-text data, providing rich multimodal information. - **High-quality data**: Through carefully designed experimental setups and data preprocessing methods, the quality and reliability of the data are ensured. - **Multimodal information processing**: The dataset covers both visual and textual stimuli, supporting the training and research of multimodal AI models. ### Experimental Validation To validate the effectiveness of EIT-1M, the paper conducted the following two experiments: 1. **EEG recognition task**: Recognizing EEG signals from visual or textual stimuli or a combination of both. 2. **EEG to image generation task**: Generating images from EEG signals. The experimental results show that EIT-1M performs excellently in multimodal information processing, with significant performance improvements, especially when combining visual and textual stimuli. ### Conclusion and Future Work The paper demonstrates the advantages of the EIT-1M dataset in multimodal information processing, providing a valuable resource for multimodal AI and cognitive neuroscience research. Future work can further expand the scale and diversity of the dataset, explore more application scenarios, and improve data collection and processing methods.

EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More

EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels

A Visual EEG Paradigm and Dataset for Recognizing the Size Transformation of Images

A large and rich EEG dataset for modeling human visual object recognition

See What You See: Self-supervised Cross-modal Retrieval of Visual Stimuli from Brain Activity

Eeg Based Visual Classification with Multi-Feature Joint Learning

Alljoined1 -- A dataset for EEG-to-Image decoding

Dual-Alpha: a Large EEG Study for Dual-Frequency SSVEP Brain-Computer Interface.

ChineseEEG: A Chinese Linguistic Corpora EEG Dataset for Semantic Alignment and Neural Decoding

CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information

Decoding Natural Images from EEG for Object Recognition

An open-access dataset of naturalistic viewing using simultaneous EEG-fMRI

EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer

Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning

SignEEG v1.0 : Multimodal Electroencephalography and Signature Database for Biometric Systems

EEG-based BCI Dataset of Semantic Concepts for Imagination and Perception Tasks

Spatio-temporal Pattern Analysis of Single-Trial EEG Signals Recorded During Visual Object Recognition

Decoding Visual Recognition of Objects from EEG Signals based on Attention-Driven Convolutional Neural Network

Human Recognition Using Deep Neural Networks and Spatial Patterns of SSVEP Signals

Attention-Based Parallel Multiscale Convolutional Neural Network for Visual Evoked Potentials EEG Classification.