ZHENG USTC TEAM’S SUBMISSION FOR DCASE2021 TASK4-SEMI-SUPERVISED SOUND EVENT DETECTION Technical Report

Xu Zheng,Yan Song
2021-01-01
Abstract:In this technical report, we present our submitted system for DCASE2021 Task4: sound event detection and separation in domestic environments. Specifically, three main techniques are applied to improve the performance of the official baseline system with both synthetic and real data (weakly labeled and unlabeled). Firstly, in order to improve the localization ability of CRNN model, we propose to use the selective kernel(SK) unit. By stacking the SK unit, each neuron can adaptively adjust its receptive field for both shortand longduration events. Secondly, based on the fact that detection outputs are dominated by the high-confidence predictions(lower than 0.1 or higher than 0.9), we propose to use soft detection output by setting proper temperature parameter in sigmoid, which can effectively improve the PSDS2 score. Thirdly, several data augmentation techniques and score fusion mechanisms are applied to improve the stability and robustness of the system performance. Experiments on the DCASE2021 task4 validation dataset demonstrate the effectiveness of the techniques used in our system. Specifically, PSDS scores of 0.45 and 0.78 are achieved for scenario1 and scenario2 respectively, outperforming the result of 0.34 and 0.53 in baseline system.
What problem does this paper attempt to address?