Abstract:A deep learning approach has been widely applied in sequence modeling problems. In terms of automatic speech recognition (ASR), its performance has significantly been improved by increasing large speech corpus and deeper neural network. Especially, recurrent neural network and deep convolutional neural network have been applied in ASR successfully. Given the arising problem of training speed, we build a novel deep recurrent convolutional network for acoustic modeling and then apply deep residual learning to it. Our experiments show that it has not only faster convergence speed but better recognition accuracy over traditional deep convolutional recurrent network. In the experiments, we compare the convergence speed of our novel deep recurrent convolutional networks and traditional deep convolutional recurrent networks. With faster convergence speed, our novel deep recurrent convolutional networks can reach the comparable performance. We further show that applying deep residual learning can boost the convergence speed of our novel deep recurret convolutional networks. Finally, we evaluate all our experimental networks by phoneme error rate (PER) with our proposed bidirectional statistical n-gram language model. Our evaluation results show that our newly proposed deep recurrent convolutional network applied with deep residual learning can reach the best PER of 17.33\% with the fastest convergence speed on TIMIT database. The outstanding performance of our novel deep recurrent convolutional neural network with deep residual learning indicates that it can be potentially adopted in other sequential problems.

Recent Progresses in Deep Learning based Acoustic Models (Updated)

Toward a Better Understanding of Deep Neural Network Based Acoustic Modelling: An Empirical Investigation

Deep Recurrent Neural Networks for Acoustic Modelling

An Acoustic Model for English Speech Recognition Based on Deep Learning

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies

Recent Advances in End-to-End Automatic Speech Recognition

Building DNN acoustic models for large vocabulary speech recognition

Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training

Analyzing deep CNN-based utterance embeddings for acoustic model adaptation

A Survey of Deep Learning Techniques in Speech Recognition

A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition

Deep neural networks for syllable based acoustic modeling in Chinese speech recognition.

Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition

Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

An Experimental Study on Speech Enhancement Based on Deep Neural Networks

Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings

Advanced Recurrent Network-Based Hybrid Acoustic Models for Low Resource Speech Recognition