An Attention-Based Time-Frequency Pyramid Pooling Strategy in Deep Convolutional Networks for Acoustic Scene Classification

Pengxu Jiang,Yang Yang,Cairong Zou,Qingyun Wang
DOI: https://doi.org/10.1109/lsp.2024.3350809
2024-02-02
IEEE Signal Processing Letters
Abstract:Convolutional neural networks (CNNs) are frequently employed in acoustic scene classification (ASC) tasks due to their ability to gather time-frequency information pertaining to the spectrum. However, the present research on ASC systems necessitates considering the model's complexity, which frequently results in an indistinct acquisition of time-frequency information within the confines of convolutional operations. This paper proposes a time-frequency pyramid pooling (TFPP) strategy that is based on low complexity for ASC, which is incorporated into the convolutional layer. The proposed TFPP strategy is utilized to acquire a local time-frequency information representation. This is achieved by employing a multi-scale pyramid computation performed on the two dimensions of the input feature map. Moreover, integrating multiple TFPP layers can facilitate the fusion effect of the time-frequency information derived from feature maps with varying resolutions. The experimental performance in three ASC databases shows the effectiveness of the proposed CNN-TFPP model in ASC tasks.
engineering, electrical & electronic
What problem does this paper attempt to address?