Abstract:In the field of human-computer interaction, the current more advanced speech recognition systems are all single speech recognition, and it is urgent to adopt new in-depth learning technology to improve the existing speech recognition system. In this context, this research is based on DNN and investigates mixed speech recognition techniques for both Chinese and English. A single speech recognition algorithm based on DNN is first investigated, and then a new hybrid Chinese and English speech recognition model is constructed by fusing the attention mechanism and CTC loss function. In the construction of the hybrid speech recognition model, the end-to-end model and Transformer framework are used to combine the monotonic alignment property of the CTC loss function, which allows complex sound units to be transformed into characters for easy extraction and recognition. The performance of the constructed models was tested on Chinese speech dataset, English speech dataset and mixed Chinese and English speech dataset to determine the recognition accuracy and speed of the models. The results show that the proposed recognition model achieves 81.2% recognition accuracy and 100 recognition speed/minute on the Chinese-English mixed speech dataset, which is much better than the other three models. This study successfully addresses the need for improved speech recognition systems by introducing a novel hybrid model for mixed Chinese-English speech recognition. The experimental results confirm the superiority of the proposed model, achieving high accuracy and rapid recognition speed. The developed model holds promising potential for enhancing human-computer interaction and enabling efficient communication between Chinese and English speakers.

Learning Dynamic Features with Neural Networks for Phoneme Recognition

Fast learning algorithms for time-delay neural networks phoneme recognition

Dynamic Multi-scale Convolution for Dialect Identification

A Fast Learning Algorithm Of Neural Network For The Training And Recognition Of The Phonemes

Deep joint learning for language recognition

Time-Frequency Cepstral Features and Combining Discriminative Training for Phonotactic Language Recognition

New Neural Network Architecture with Application in Mandarin Digit Speech Recognition

A Fast Learning Algorithm for Time-Delay Neural Networks

Phonetic Temporal Neural Model for Language Identification

A Dynamic Neural Network for Syllable Recognition.

Dynamic noise aware training for speech enhancement based on deep neural networks.

Boosting Dnn-Based Speech Enhancement Via Explicit Transformations

Structured Discriminative Models Using Deep Neural-Network Features.

Deep Neural Network-based Mixed Speech Recognition Technology for Chinese and English

Phonotactic language recognition based on DNN-HMM acoustic model

Contrastive Auto-Encoder for Phoneme Recognition.

Improving Blstm Rnn Based Mandarin Speech Recognition Using Accent Dependent Bottleneck Features

Learning Waveform-Based Acoustic Models Using Deep Variational Convolutional Neural Networks

Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition

I-vector Dependent Feature Space Transformations for Adaptive Speech Recognition

Multilingual and Crosslingual Speech Recognition Using Phonological-Vector Based Phone Embeddings