Abstract:In the field of human-computer interaction, the current more advanced speech recognition systems are all single speech recognition, and it is urgent to adopt new in-depth learning technology to improve the existing speech recognition system. In this context, this research is based on DNN and investigates mixed speech recognition techniques for both Chinese and English. A single speech recognition algorithm based on DNN is first investigated, and then a new hybrid Chinese and English speech recognition model is constructed by fusing the attention mechanism and CTC loss function. In the construction of the hybrid speech recognition model, the end-to-end model and Transformer framework are used to combine the monotonic alignment property of the CTC loss function, which allows complex sound units to be transformed into characters for easy extraction and recognition. The performance of the constructed models was tested on Chinese speech dataset, English speech dataset and mixed Chinese and English speech dataset to determine the recognition accuracy and speed of the models. The results show that the proposed recognition model achieves 81.2% recognition accuracy and 100 recognition speed/minute on the Chinese-English mixed speech dataset, which is much better than the other three models. This study successfully addresses the need for improved speech recognition systems by introducing a novel hybrid model for mixed Chinese-English speech recognition. The experimental results confirm the superiority of the proposed model, achieving high accuracy and rapid recognition speed. The developed model holds promising potential for enhancing human-computer interaction and enabling efficient communication between Chinese and English speakers.

Neural Network Ensemble Based on Vowel Classification for Chinese Speaker Recognition

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

A Chinese Acoustic Model Based On Convolutional Neural Network

New Neural Network Architecture with Application in Mandarin Digit Speech Recognition

Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features

Visual-Audio Emotion Recognition Based on Multi-Task and Ensemble Learning with Multiple Features

Parallel Neural Networks for Speaker-Independent All-Chinese-syllable Speech Recognition

Deep Neural Network-based Mixed Speech Recognition Technology for Chinese and English

A New Neural Network Oriented Speech Recognition

Naive Bayes and BiLSTM Ensemble for Discriminating between Mainland and Taiwan Variation of Mandarin Chinese

Mongolian acoustic modeling based on deep neural network

An Anti-noise Algorithm for Speaker-Dependent Mandarin Vocal Numbers Recognition

Chinese Vowels Recognition Method Based on Non-Homogeneous Hidden Markov Model

A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition

Robust Recognition of Mandarin Vowels by Articulatory Manners

A Chinese Speech Recognition System Based on Articulatory Features

Deep neural networks for syllable based acoustic modeling in Chinese speech recognition.

Recognition of Chinese Continuous Speech Based on 3-Dimension Viterbi Search

Chinese Dialect Speech Recognition Based on End-to-end Machine Learning

Combining wav2vec 2.0 Fine-Tuning and ConLearnNet for Speech Emotion Recognition

Two-stage Training for Chinese Dialect Recognition