Investigation of Data Augmentation Techniques in Environmental Sound Recognition

Anastasios Loukas Sarris,Nikolaos Vryzas,Lazaros Vrysis,Charalampos Dimoulas
DOI: https://doi.org/10.3390/electronics13234719
IF: 2.9
2024-11-30
Electronics
Abstract:The majority of sound events that occur in everyday life, like those caused by animals or household devices, can be included in the environmental sound family. This audio category has not been researched as much as music or speech recognition. One main bottleneck in the design of environmental data-driven monitoring automation is the lack of sufficient data representing each of a wide range of categories. In the context of audio data, an important method to increase the available data is the process of the augmentation of existing datasets. In this study, some of the most widespread time domain data augmentation techniques are studied, along with their effects on the recognition of environmental sounds, through the UrbanSound8K dataset, which consists of ten classes. The confusion matrix and the metrics that can be calculated based on the matrix were used to examine the effect of the augmentation. Also, to address the difficulty that arises when large datasets are augmented, a web-based data augmentation application was created. To evaluate the performance of the data augmentation techniques, a convolutional neural network architecture trained on the original set was used. Moreover, four time domain augmentation techniques were used. Although the parameters of the techniques applied were chosen conservatively, they helped the model to better cluster the data, especially in the four classes in which confusion was high in the initial classification. Furthermore, a web application is presented in which the user can upload their own data and apply these data augmentation techniques to both the audio extract and its time frequency representation, the spectrogram.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?