Adaptive DCTNet for Audio Signal Classification

Yin Xian,Yunchen Pu,Zhe Gan,Liang Lu,Andrew Thompson
DOI: https://doi.org/10.1121/1.4970932
2017-04-30
Abstract:In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental results show that the A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art performance in bird song classification rate, and improve artist identification accuracy in music data. They demonstrate A-DCTNet's applicability to signal processing problems.
Sound
What problem does this paper attempt to address?
This paper attempts to solve several key problems in audio signal classification: 1. **Improvement of feature representation**: Although traditional audio signal features (such as Mel - Frequency Cepstral Coefficients (MFCC) and ERB - rate scale features) can reveal the intrinsic properties of audio signals, they are sensitive to noise. The paper proposes a new feature extraction method - Adaptive DCTNet (A - DCTNet), aiming to improve the robustness and effectiveness of feature representation. 2. **Capture of low - frequency information**: The human auditory system is particularly sensitive to low - frequency information, while traditional methods perform poorly in capturing low - frequency information. A - DCTNet can better capture low - frequency acoustic information by using geometrically spaced center - frequency filters, thus improving the quality of feature representation. 3. **Optimization of time - frequency analysis**: The paper proves that the output of the two - layer DCTNet belongs to the time - frequency distribution of the Cohen class, which indicates that DCTNet has a good theoretical basis in time - frequency analysis. This property enables DCTNet to effectively handle the complex structure of audio signals. 4. **Combination with Recurrent Neural Network (RNN)**: In order to utilize the sequence information of audio signals, the paper combines A - DCTNet with RNN, further improving the classification performance. RNN can capture the long - term dependencies in audio signals, thus achieving state - of - the - art classification results on music data and bird song data. In summary, this paper mainly focuses on improving the accuracy and robustness of audio signal classification by improving the feature extraction method, and especially makes innovative explorations in low - frequency information capture and time - frequency analysis.