Understanding Audio Pattern Using Convolutional Neural Network From Raw Waveforms

Shuhui Qu,Juncheng Li,Wei Dai,Samarjit Das
DOI: https://doi.org/10.48550/arXiv.1611.09524
2016-11-29
Abstract:One key step in audio signal processing is to transform the raw signal into representations that are efficient for encoding the original information. Traditionally, people transform the audio into spectral representations, as a function of frequency, amplitude and phase transformation. In this work, we take a purely data-driven approach to understand the temporal dynamics of audio at the raw signal level. We maximize the information extracted from the raw signal through a deep convolutional neural network (CNN) model. Our CNN model is trained on the urbansound8k dataset. We discover that salient audio patterns embedded in the raw waveforms can be efficiently extracted through a combination of nonlinear filters learned by the CNN model.
Sound
What problem does this paper attempt to address?