EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More

Xu Zheng,Ling Wang,Kanghao Chen,Yuanhuiyi Lyu,Jiazhou Zhou,Lin Wang
2024-07-02
Abstract:Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli presented to participants compromise their quality and fidelity in capturing precise brain activity. The study in neuroscience unveils that the relationship between visual and textual stimulus in EEG recordings provides valuable insights into the brain's ability to process and integrate multi-modal information simultaneously. Inspired by this, we propose a novel large-scale multi-modal dataset, named EIT-1M, with over 1 million EEG-image-text pairs. Our dataset is superior in its capacity of reflecting brain activities in simultaneously processing multi-modal information. To achieve this, we collected data pairs while participants viewed alternating sequences of visual-textual stimuli from 60K natural images and category-specific texts. Common semantic categories are also included to elicit better reactions from participants' brains. Meanwhile, response-based stimulus timing and repetition across blocks and sessions are included to ensure data diversity. To verify the effectiveness of EIT-1M, we provide an in-depth analysis of EEG data captured from multi-modal stimuli across different categories and participants, along with data quality scores for transparency. We demonstrate its validity on two tasks: 1) EEG recognition from visual or textual stimuli or both and 2) EEG-to-visual generation.
Computer Vision and Pattern Recognition,Human-Computer Interaction
What problem does this paper attempt to address?
The paper attempts to address the shortcomings of existing EEG datasets in multimodal information processing. Specifically, the existing EEG datasets have the following two main issues: 1. **Limited number of EEG samples per category**: This limits the quality and accuracy of the datasets, especially in capturing brain activity. 2. **Only contain single-modal stimuli (visual or textual)**: This makes these datasets difficult to use for training high-performance multimodal AI models. To overcome these issues, the paper proposes a new large-scale multimodal dataset, EIT-1M, which contains over 1 million pairs of EEG-image-text data. By collecting brain activity data from participants while they view alternating visual and textual stimuli, EIT-1M aims to better reflect brain activity when processing multimodal information simultaneously. ### Main Contributions - **Large-scale dataset**: EIT-1M contains over 1 million pairs of EEG-image-text data, providing rich multimodal information. - **High-quality data**: Through carefully designed experimental setups and data preprocessing methods, the quality and reliability of the data are ensured. - **Multimodal information processing**: The dataset covers both visual and textual stimuli, supporting the training and research of multimodal AI models. ### Experimental Validation To validate the effectiveness of EIT-1M, the paper conducted the following two experiments: 1. **EEG recognition task**: Recognizing EEG signals from visual or textual stimuli or a combination of both. 2. **EEG to image generation task**: Generating images from EEG signals. The experimental results show that EIT-1M performs excellently in multimodal information processing, with significant performance improvements, especially when combining visual and textual stimuli. ### Conclusion and Future Work The paper demonstrates the advantages of the EIT-1M dataset in multimodal information processing, providing a valuable resource for multimodal AI and cognitive neuroscience research. Future work can further expand the scale and diversity of the dataset, explore more application scenarios, and improve data collection and processing methods.