Multi Model-Based Distillation for Sound Event Detection

Yingwei Fu,Kele Xu,Haibo Mi,Qiuqiang Kong,Dezhi Wang,Huaimin Wang,Tie Hong
DOI: https://doi.org/10.1587/transinf.2019edl8062
2019-01-01
IEICE Transactions on Information and Systems
Abstract:Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.
What problem does this paper attempt to address?