Meta learning based audio tagging.
Kele Xu,Boqing Zhu,Dezhi Wang,Yuxing Peng,Huaimin Wang,Lilun Zhang,Bo Li
2018-01-01
Abstract:In this paper, we describe our solution for the general-purpose audio tagging task, which belongs to one of the subtasks in the DCASE 2018 challenge. For the solution, we employed both deep learning methods and statistic features-based shallow architecture learners. For single model, different deep convolutional neural network architectures are tested with different kinds of input, which ranges from the raw-signal, log-scaled Mel-spectrograms (log Mel) to Mel Frequency Cepstral Coefficients (MFCC). For log Mel and MFCC, the delta and delta-delta information are also used to formulate three-channel features, while mixup is used for the data augmentation. Using ResNeXt, our best single convolutional neural network architecture provides a mAP@3 of 0.967 on the public Kaggle leaderboard, 0.939 on the private leaderboard. Moreover, to improve the accuracy further, we also propose a meta learning-based ensemble method. By employing the diversities between different architectures, the meta learning-based model can provide higher prediction accuracy and robustness with comparison to the single model. Our solution achieves a mAP@3 of 0.977 on the public leaderboard and 0.951 as our best on the private leaderboard, while the baseline gives a mAP@3 of 0.704.