Dynamic TF-TDNN: Dynamic Time Delay Neural Network Based on Temporal-Frequency Attention for Dialect Recognition
Chao Liao,Jinwen Huang,Huan Yuan,Peng Yao,Jianchao Tan,Dawei Zhang,Feng Deng,Xiaorui Wang,Chengru Song
DOI: https://doi.org/10.1109/icassp49357.2023.10096335
2023-01-01
ICASSP
Abstract:Dialect recognition aims to recognize dialect categories in utterances, which has been applied in many audio applications. Recently, various Time Delayed Neural Network (TDNN) based AI models are proposed to solve dialect recognition problems, such as D-TDNN, DMC-TDNN, and ECAPA-TDNN, however, most of them only perform temporal attention in the last statistical pooling layer of the TDNN network, which ignores the importance of simultaneously capturing both frequency and temporal key information in utterances under different receptive fields. In contrast, we introduce a hybrid attention mechanism in both the temporal and frequency domain, called the TF-attention module, which adaptively pays more attention to the indeed important frames and the frame-level important information under different receptive fields for dialect recognition. Moreover, we are the first to introduce a dynamic architecture mechanism in the field of dialect recognition to dynamically reduce the computational cost and the number of parameters of models. We evaluate the proposed dynamic TF-TDNN on the OLR challenge AP20-OLR-dialect task and achieve State-Of-The-Art (SOTA) performance with fewer model parameters.