Abstract:In the field of human-computer interaction, the current more advanced speech recognition systems are all single speech recognition, and it is urgent to adopt new in-depth learning technology to improve the existing speech recognition system. In this context, this research is based on DNN and investigates mixed speech recognition techniques for both Chinese and English. A single speech recognition algorithm based on DNN is first investigated, and then a new hybrid Chinese and English speech recognition model is constructed by fusing the attention mechanism and CTC loss function. In the construction of the hybrid speech recognition model, the end-to-end model and Transformer framework are used to combine the monotonic alignment property of the CTC loss function, which allows complex sound units to be transformed into characters for easy extraction and recognition. The performance of the constructed models was tested on Chinese speech dataset, English speech dataset and mixed Chinese and English speech dataset to determine the recognition accuracy and speed of the models. The results show that the proposed recognition model achieves 81.2% recognition accuracy and 100 recognition speed/minute on the Chinese-English mixed speech dataset, which is much better than the other three models. This study successfully addresses the need for improved speech recognition systems by introducing a novel hybrid model for mixed Chinese-English speech recognition. The experimental results confirm the superiority of the proposed model, achieving high accuracy and rapid recognition speed. The developed model holds promising potential for enhancing human-computer interaction and enabling efficient communication between Chinese and English speakers.

A Speech Recognition Method Based on Transfer Learning for PSC Topic Speaking Section

Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin

Automatic spoken English test for Chinese learners

A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications

Speech Recognition for Air Traffic Control Via Feature Learning and End-to-end Training

Research on Modeling Units of Transformer Transducer for Mandarin Speech Recognition

Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification

Speaker Change Detection for Transformer Transducer ASR

Almost Unsupervised Text to Speech and Automatic Speech Recognition

A Spoken English Teaching System Based on Speech Recognition and Machine Learning

Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition

Topic Classification on Spoken Documents Using Deep Acoustic and Linguistic Features

A Convenient and Extensible Offline Chinese Speech Recognition System Based on Convolutional CTC Networks

Deep Neural Network-based Mixed Speech Recognition Technology for Chinese and English

Non-autoregressive Mandarin-English Code-switching Speech Recognition

A reweighting method for speech recognition with imbalanced data of Mandarin and sub-dialects

A Speech Enhancement Neural Network Architecture with SNR-Progressive Multi-Target Learning for Robust Speech Recognition

Transformer-Transducers for Code-Switched Speech Recognition

The NTNU Taiwanese ASR System for Formosa Speech Recognition Challenge 2020

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

A Transfer and Multi-Task Learning Based Approach for MOS Prediction