Improved Regularization Techniques for End-to-End Speech Recognition

Yingbo Zhou,Caiming Xiong,Richard Socher
DOI: https://doi.org/10.48550/arXiv.1712.07108
2017-12-20
Abstract:Regularization is important for end-to-end speech models, since the models are highly flexible and easy to overfit. Data augmentation and dropout has been important for improving end-to-end models in other domains. However, they are relatively under explored for end-to-end speech models. Therefore, we investigate the effectiveness of both methods for end-to-end trainable, deep speech recognition models. We augment audio data through random perturbations of tempo, pitch, volume, temporal alignment, and adding random <a class="link-external link-http" href="http://noise.We" rel="external noopener nofollow">this http URL</a> further investigate the effect of dropout when applied to the inputs of all layers of the network. We show that the combination of data augmentation and dropout give a relative performance improvement on both Wall Street Journal (WSJ) and LibriSpeech dataset of over 20%. Our model performance is also competitive with other end-to-end speech models on both datasets.
Computation and Language,Sound,Audio and Speech Processing,Machine Learning
What problem does this paper attempt to address?