Abstract:To overcome some weaknesses of hidden Markov model in speech recognition, HMM/NN hybrid systems had been explored by many researchers in recent years. In the previous HMM/NN hybrid systems, the neural networks adopted are mostly multilayer perceptron (MLP). In our system, recurrent neural networks (RNN) were used to take the place of MLP as the syllable probability estimator. RNN is MLP incorporated with a feedback which can transport the output of some neurons to other neurons or themselves. The incorporation of feedback into a MLP gives the net the ability to efficiently process the context information of time sequence, which is especially useful for speech recognition. In this paper, the architecture of the RNN is modified and corresponding training schema is presented. 　　Following techniques have been adopted in our system. 　　1. A network with a single layer has been adopted, while the content of feedback is different from the network used by previous researchers, i.e., the external output is included in the feedback, not just the internal state output. 　　2. The training algorithm adopted in our system is back-propagation through time (BPTT) algorithm. In the common BPTT algorithm, the initial feedback values are set arbitrarily according to experience. This means that the initial feedback is not specific to the problem we are dealing with. So it should be preferable if the initial feedback values also can be trained. In our training algorithm, this is achieved by adding an additional layer to the unfolded network. 　　3. To train the network, proper target values must be given. To acquire them, we take use of HMMs which have been trained to recognize the same syllables. The advantage of this method is that it avoids the difficulty and inaccuracy of the hand-set teacher signals and it gives a smooth transition between two adjacent states. 　　4. In order to make the network learn faster and acquire better generalization ability, a strategy which trains the network by stages has been used. At first, short fragments of speech sequences are given. After small enough error has been achieved on these short pieces, longer fragments are used to learn. Finally, whole sequences are learned. 　　Experiment results show that the training speed can be accelerated by the method, and the recognition performance is also improved.

Recurrent neural network for spectral mapping in speech bandwidth extension

Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks.

Restoring High Frequency Spectral Envelopes Using Neural Networks For Speech Bandwidth Extension

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

A Novel Unified Framework for Speech Enhancement and Bandwidth Extension Based on Jointly Trained Neural Networks

A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-Based Sparse Representation.

High Fidelity Speech Enhancement with Band-split RNN

An RNN-based Speech Enhancement Method for a Binaural Hearing Aid System

DSP-informed bandwidth extension using locally-conditioned excitation and linear time-varying filter subnetworks

Voice Conversion Using Deep Neural Networks with Layer-Wise Generative Training

Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension.

Vector Quantized Diffusion Model Based Speech Bandwidth Extension

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

An Experimental Study on Joint Modeling of Mixed-Bandwidth Data Via Deep Neural Networks for Robust Speech Recognition.

Advanced Recurrent Network-Based Hybrid Acoustic Models for Low Resource Speech Recognition

A New Real-Time Noise Suppression Algorithm for Far-Field Speech Communication Based on Recurrent Neural Network

Time-Domain Neural Network Approach for Speech Bandwidth Extension.

Spectral-spatial classification of hyperspectral imagery based on recurrent neural networks

Recurrent Neural Networks and Acoustic Features for Frame-Level Signal-to-Noise Ratio Estimation.

Speech Recognition Model Based on Recurrent Neural Networks

BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution