Abstract:Sound event detection (SED) plays an important role in understanding the sounds in different environments. Recent studies on standardized datasets have shown the growing interest of the scientific community in the SED problem, however, these did not pay sufficient attention to the detection of artificial and natural sound. In order to tackle this issue, the present article uses different features in combination for detection of machine-generated and natural sounds. In this article, we trained and compared a Stacked Convolutional Recurrent Neural Network (S-CRNN), a Convolutional Recurrent Neural Network (CRNN), and an Artificial Neural Network Classifier (ANN) using the DCASE 2017 Task-3 dataset. Relative spectral–perceptual linear prediction (RASTA-PLP) and Mel-frequency cepstrum (MFCC) features are used as input to the proposed multi-model. The performance of monaural and binaural approaches provided to the classifier as an input is compared. In our proposed S-CRNN model, we classified the sound events in the dataset into two sub-classes. When compared with the baseline model, our obtained results show that the PLP-based ANN classifier improves the individual error rate (ER) for each sound event, e.g., the error rate (ER) is improved to 0.23 for heavy vehicle events and 0.32 for people walking, and minor gains are shown in other events as compared to the baseline. Our proposed CRNN performs well when compare to the baseline and to our proposed ANN model. Moreover, in cross-validation trials, the results in the evaluation stage demonstrate a significant improvement compared to the best performance of DCASE 2017 Task-3, reducing the ER to 0.11 and increasing the F1-score by 10% in the evaluation dataset. Erosion and dilation were used during post-processing.

Multi Model-Based Distillation for Sound Event Detection

A Mobile Application for Sound Event Detection

Attention mechanism combined with residual recurrent neural network for sound event detection and localization

Compression of Acoustic Event Detection Models with Quantized Distillation

MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection.

Research on Knowledge Distillation Algorithm of Object Detection

Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection

MULTI-SCALE CONVOLUTION BASED ATTENTION NETWORK FOR SEMI-SUPERVISED SOUND EVENT DETECTION Technical Report

Multi-Scale and Single-Scale Fully Convolutional Networks for Sound Event Detection

Multi-Scale Recurrent Neural Network for Sound Event Detection

Convolutional Recurrent Neural Networks with Multi-Sized Convolution Filters for Sound-Event Recognition

Multi-Representation Knowledge Distillation for Audio Classification

Sound event detection via dilated convolutional recurrent neural networks

Weakly and semi-supervised learning for sound event detection using image pretrained convolutional recurrent neural network, weighted pooling and mean teacher method

NAS-DYMC: NAS-Based Dynamic Multi-Scale Convolutional Neural Network for Sound Event Detection

An Adversarial Feature Distillation Method for Audio Classification

Dual Knowledge Distillation for Efficient Sound Event Detection

Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement

Improved Multi-Model Classification Technique for Sound Event Detection in Urban Environments

Dilated-Gated Convolutional Neural Network with A New Loss Function on Sound Event Detection.