Sparse Self-Attention for Semi-Supervised Sound Event Detection

Yadong Guan,Jiabin Xue,Guibin Zheng,Jiqing Han
DOI: https://doi.org/10.1109/icassp43922.2022.9747834
2022-01-01
Abstract:Self-attention mechanism has been widely employed in semi-supervised sound event detection (SS-SED). In self-attention, since dependencies between pairwise features at all moments are captured, the irrelevant features of different classes of sounds and background sounds at other moments are inevitably mixed in the current embedding when self-attention performs weighted summation. These irrelevant features will weaken the ability of the aggregated embedding to describe sound events. In this paper, we propose a sparse self-attention mechanism to alleviate the impact. Specifically, the Sparsemax function is introduced for attention weights normalization, which uses Euclidean projection to project attention weights onto a probability simplex. After the normalization, the attention weights of the irrelevant features are projected onto the boundary of the simplex and then removed. Furthermore, to solve the excessive sparsity problem of the Sparsemax, we further propose the Sparsemax with adjustable sparsity. Experimental results demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?