Abstract:The goal of acoustic (or sound) events detection (AED or SED) is to predict the temporal position of target events in given audio segments. This task plays a significant role in safety monitoring, acoustic early warning and other scenarios. However, the deficiency of data and diversity of acoustic event sources make the AED task a tough issue, especially for prevalent data-driven methods. In this paper, we start by analyzing acoustic events according to their time-frequency domain properties, showing that different acoustic events have different time-frequency scale characteristics. Inspired by the analysis, we propose an adaptive multi-scale detection (AdaMD) method. By taking advantage of the hourglass neural network and gated recurrent unit (GRU) module, our AdaMD produces multiple predictions at different temporal and frequency resolutions. An adaptive training algorithm is subsequently adopted to combine multi-scale predictions to enhance its overall capability. Experimental results on Detection and Classification of Acoustic Scenes and Events 2017 (DCASE 2017) Task 2, DCASE 2016 Task 3 and DCASE 2017 Task 3 demonstrate that the AdaMD outperforms published state-of-the-art competitors in terms of the metrics of event error rate (ER) and F1-score. The verification experiment on our collected factory mechanical dataset also proves the noise-resistant capability of the AdaMD, providing the possibility for it to be deployed in the complex environment.

Adaptation of Tandem Hidden Markov Models for Non-Speech Audio Event Detection.

Agmma: A Novel Incremental Adaptation Method And Its Application To Speaker Recognition

Convolutional bidirectional long short-term memory hidden Markov model hybrid system for polyphonic sound event detection

Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.

MTDA-HSED: Mutual-Assistance Tuning and Dual-Branch Aggregating for Heterogeneous Sound Event Detection

Adaptive Multi-scale Detection of Acoustic Events

Modelling of Sound Events with Hidden Imbalances Based on Clustering and Separate Sub-Dictionary Learning

Double Mixture: Towards Continual Event Detection from Speech

Sound event detection in remote health care - small learning datasets and over constrained Gaussian Mixture Models

Balanced Deep CCA for Bird Vocalization Detection

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

Joint analysis of acoustic scenes and sound events based on multitask learning with dynamic weight adaptation

RESEARCH AND REALISATION OF TANDEM IN MANDARIN SPEECH ERROR DETECTION SYSTEM

Audio Content-based Highlight Detection Using Adaptive Hidden Markov Model

Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study

Social Signal Detection by Probabilistic Sampling DNN Training

Anomaly Detection in Audio with Concept Drift using Adaptive Huffman Coding

An Approach for Self-Training Audio Event Detectors Using Web Data

Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?

Spatial-based Bayesian Hidden Markov Models with Dirichlet Mixtures for Video Anomaly Detection