EEGCiD: EEG Condensation into Diffusion Model
Junfu Chen,Dechang Pi,Xiaoyi Jiang,Feng Gao,Bi Wang,Yang Chen
DOI: https://doi.org/10.1109/tase.2024.3486203
IF: 6.636
2024-01-01
IEEE Transactions on Automation Science and Engineering
Abstract:Electroencephalography (EEG)-based applications in Brain-Computer Interfaces (BCIs), neurological disease diagnosis, rehabilitation, and other areas rely on the utilization of extensive data for model development. Nevertheless, this raises concerns regarding storage and privacy, since model development needs a significant amount of data, and EEG sharing discloses sensitive information such as identity and health. To address this challenging problem, we provide the paradigm of EEG condensation, aiming to generate a synthetic sample set that is highly information-concentrated yet not visually similar. Correspondingly, we propose a novel dataset condensation framework where the knowledge of the original EEG dataset is condensed into diffusion models, named EEGCiD. Specifically, EEGCiD first utilizes a deterministic denoising diffusion implicit model (DDIM) to store the information of the original dataset and optimizes the condensation latent codes $z$ to obtain the EEG condensation dataset. Further, to enhance the modeling of EEG knowledge in DDIM, we design a transformer architecture incorporating the spatial and temporal self-attention block (STSA) to replace the traditional U-Net backbone. In the condensation phase, EEGCiD randomly initializes a subset of samples from the original dataset to obtain the condensation latent codes $z$ through the forward process in DDIM. Then, it optimizes $z$ by matching the feature distributions in multiple EEG decoding models between the synthetic samples and the original dataset. Extensive experiments across three EEG datasets demonstrate that the condensation dataset from the proposed model not only achieves superior classification performance with limited sample sizes, but also effectively prevents membership inference attacks (MIA). Note to Practitioners —This paper aims to investigate a novel EEG generation paradigm that extracts representative synthetic samples from large-scale datasets. Existing studies in EEG generation primarily concentrate on generating real-like signals, and some work claims that the generated EEG can serve as a substitute for the original dataset to achieve privacy preservation. In the EEGCiD framework, the deterministic DDIM is pre-trained with the original dataset to store the knowledge. Besides, an ensemble feature matching strategy is proposed to condense the information from the original dataset into a small latent code set. Experiments on three datasets demonstrate that EEGCiD addresses two fundamental challenges: 1) obtaining superior classification performance within a small dataset (limited storage capacity); 2) avoiding potential privacy issues during EEG sharing and transmission.