Hand-crafted Attention is All You Need? A Study of Attention on Self-supervised Audio Transformer

Wu Tsung-Han,Hsieh Chun-Chen,Chen Yen-Hao,Chi Po-Han,Lee Hung-yi
2020-01-01
Abstract: In this paper, we seek to reduce the computation complexity of transformer-based models for speech representation learning. We evaluate 10 attention mechanisms; then, we pre-train the transformer-based model with those attentions in a self-supervised fashion and use them as feature extractors on downstream tasks, including phoneme classification and speaker classification. We find that the proposed approach, which only uses hand-crafted and learnable attentions, is comparable with the full self-attention.
What problem does this paper attempt to address?