Abstract:The majority of sound events that occur in everyday life, like those caused by animals or household devices, can be included in the environmental sound family. This audio category has not been researched as much as music or speech recognition. One main bottleneck in the design of environmental data-driven monitoring automation is the lack of sufficient data representing each of a wide range of categories. In the context of audio data, an important method to increase the available data is the process of the augmentation of existing datasets. In this study, some of the most widespread time domain data augmentation techniques are studied, along with their effects on the recognition of environmental sounds, through the UrbanSound8K dataset, which consists of ten classes. The confusion matrix and the metrics that can be calculated based on the matrix were used to examine the effect of the augmentation. Also, to address the difficulty that arises when large datasets are augmented, a web-based data augmentation application was created. To evaluate the performance of the data augmentation techniques, a convolutional neural network architecture trained on the original set was used. Moreover, four time domain augmentation techniques were used. Although the parameters of the techniques applied were chosen conservatively, they helped the model to better cluster the data, especially in the four classes in which confusion was high in the initial classification. Furthermore, a web application is presented in which the user can upload their own data and apply these data augmentation techniques to both the audio extract and its time frequency representation, the spectrogram.

Data Independent Sequence Augmentation Method for Acoustic Scene Classification.

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

Long-term scalogram integrated with an iterative data augmentation scheme for acoustic scene classification

A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection

Adaptive data augmentation for mandarin automatic speech recognition

Multi-modal Attention Mechanisms in LSTM and Its Application to Acoustic Scene Classification

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision.

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

Semantic Data Augmentation for End-to-End Mandarin Speech Recognition

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Data Augmentation for End-to-end Code-switching Speech Recognition

A Study on Joint Modeling and Data Augmentation of Multi-Modalities for Audio-Visual Scene Classification

Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Sample-aware Data Augmentor for Scene Text Recognition

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis

Metric Learning Based Data Augmentation for Environmental Sound Classification.

Investigation of Data Augmentation Techniques in Environmental Sound Recognition

Acoustic data augmentation for small passive acoustic monitoring datasets

ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition