RealMind: Zero-Shot EEG-Based Visual Decoding and Captioning Using Multi-Modal Models

Dongyang Li,Haoyang Qin,Mingyang Wu,Yuang Cao,Chen Wei,Quanying Liu
2024-10-31
Abstract:Despite significant progress in visual decoding with fMRI data, its high cost and low temporal resolution limit widespread applicability. To address these challenges, we introduce RealMind, a novel EEG-based visual decoding framework that leverages multi-modal models to efficiently interpret semantic information. By integrating semantic and geometric consistency learning, RealMind enhances feature alignment, leading to improved decoding performance. Our framework achieves a 56.73\% Top-5 accuracy in a 200-way retrieval task and a 26.59\% BLEU-1 score in a 200-way visual captioning task, representing the first successful attempt at zero-shot visual captioning using EEG data. RealMind provides a robust, adaptable, and cost-effective alternative to fMRI-based methods, offering scalable solutions for EEG-based visual decoding in practical applications.
Human-Computer Interaction,Neurons and Cognition
What problem does this paper attempt to address?