Cross-Attention-Guided Wavenet for Mel Spectrogram Reconstruction in the ICASSP 2024 Auditory EEG Challenge

Yuan Fang,Hao Li,Xueliang Zhang,Fei Chen,Guanglai Gao
DOI: https://doi.org/10.1109/icasspw62465.2024.10627006
2024-01-01
Abstract:This paper provides an overview of our submission to Task 2 of the Auditory EEG Challenge at ICASSP 2024 Signal Processing Grand Challenge (SPGC). We introduce a novel approach, employing a cross-attention-guided WaveNet with a coarse-to-fine generation strategy, aimed at enhancing the detailed reconstruction of Mel spectrograms from time-domain EEG. Specifically, the model utilizes WaveNet to sequentially reconstruct the envelope, 10-band Mel, 80-band Mel, and magnitude from coarse to fine granular levels. To bridge the gap between different modalities, we introduce a cross-attention mechanism, exploring correlations across modalities. A combined loss function is employed to refine the reconstruction performance. Notably, we achieved Pearson correlation values of 0.0651 ± 0.0153 for the validation set and 0.0413 ± 0.0169 for the heldout-subjects test set, securing the second position in the competition. We release the training code for our model online 1 .
What problem does this paper attempt to address?