Abstract:In this paper, we present our approach for the Track 1 of the Chinese Auditory Attention Decoding (Chinese AAD) Challenge at ISCSLP 2024. Most existing spatial auditory attention decoding (Sp-AAD) methods employ an isolated window architecture, focusing solely on global invariant features without considering relationships between different decision windows, which can lead to suboptimal performance. To address this issue, we propose a novel streaming decoding architecture, termed StreamAAD. In StreamAAD, decision windows are input to the network as a sequential stream and decoded in order, allowing for the modeling of inter-window relationships. Additionally, we employ a model ensemble strategy, achieving significant better performance than the baseline, ranking First in the challenge.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to decode an individual's spatial auditory attention more accurately in a complex auditory environment. Specifically, most of the existing spatial auditory attention decoding (Sp - AAD) methods adopt an isolated - window architecture, focusing only on global invariant features while ignoring the relationships between different decision windows, which may lead to poor performance. To solve this problem, the paper proposes a new streaming decoding architecture (StreamAAD), which can model the relationships between different decision windows, thereby improving the decoding performance.
### Problem Background
In a complex auditory environment, people with normal hearing can effectively track the sounds of interest by adjusting their attention. However, those with hearing impairments have difficulty communicating effectively in these scenarios even when using hearing aids, because these devices cannot selectively enhance the sounds of interest. For this reason, researchers have proposed the development of intelligent hearing aids, which detect the user's attention focus through neural signals and selectively amplify the desired voice to achieve the goal of "hearing what you want to hear". This technology is known as auditory attention decoding (AAD).
### Existing Problems
Current Sp - AAD methods usually adopt an isolated - window decoding architecture, that is, dividing the EEG signals into multiple independent decision windows and decoding each window independently. This method has two main drawbacks:
1. **Lack of Temporal Information**: It is unable to model the temporal information between different windows, limiting the ability to handle EEG feature drifts and relying only on global invariant features.
2. **Decoding Error**: Due to the lack of correlation between adjacent windows, sudden decoding errors are likely to occur, affecting the user experience.
### Solution
To solve the above problems, the paper proposes a streaming decoding architecture (StreamAAD). In StreamAAD, decision windows are input into the network as a sequence and decoded in sequence, so that the relationships between different decision windows can be modeled. In addition, the paper also adopts a model - integration strategy, which significantly improves the performance and ranks first in the challenge.
### Technical Details
- **Streaming Decoding Architecture**: Similar to a recurrent neural network (RNN), StreamAAD retains information after processing each decision window and uses it to decode subsequent windows.
- **LSTM - like Decoder**: A long - short - term memory network (LSTM) is used to model long - short - term dependencies, but the information transfer occurs at the decision - window level rather than the sampling - point level.
- **Convolutional Block**: Used to extract features of the input decision window, including a 1D convolutional layer, a ReLU activation function, a global average pooling layer along the time dimension, and a layer - normalization layer.
- **Linear Block**: Used to handle short - term memory content, including a fully - connected layer and a ReLU activation function.
### Experimental Results
The experimental results show that StreamAAD outperforms existing methods in multiple metrics:
- **Decoding Accuracy**: The average decoding accuracy on 8 subjects is 95.26%, which is 18.30% higher than the baseline method DBPNet.
- **Parameters and Computational Cost**: The number of parameters, the number of multiply - accumulate operations (MACs), and the memory usage of StreamAAD are all much lower than those of DBPNet.
### Conclusion
The StreamAAD proposed in the paper significantly improves the performance of spatial auditory attention decoding by modeling the relationships between different decision windows and performs well in terms of resource consumption. By adopting an integration strategy, the decoding accuracy on the official test set is further improved, and finally ranks first in Track 1 of the ISCSLP 2024 China AAD Challenge.