Abstract:As an important carrier of information, since sound can be collected quickly and is not limited by angle and light, it is often used to assist in understanding the environment and creating information. Voice signal recognition technology is a typical speech recognition application. This article focuses on the voice signal recognition technology around various deep learning models. By using deep learning neural networks with different structures and different types, information and representations related to the recognition of sound signal samples can be obtained, so as to further improve the detection accuracy of the sound signal recognition detection system. Based on this, this paper proposes an enhanced deep learning model of multi-scale neural convolutional network and uses it to recognize sound signals. The CCCP layer is used to reduce the dimensionality of the underlying feature map, so that the units captured in the network will eventually have internal features in each layer, thereby retaining the feature information to the maximum extent, which will form a convolutional multi-scale model in network deep learning Neurons. Finally, the article discusses the related issues of Japanese speech recognition on this basis. This article first uses the font (gra-phonem), that is, all these Japanese kana and common Chinese characters, using a total of 2795 units for modeling. There is a big gap between the experiment and the (BiLSTM-HMM) system. In addition, when Japanese speech is known, it is incorporated into the end-to-end recognition system to improve the performance of the Japanese speech recognition system. Based on the above-mentioned deep learning and sound signal analysis experiments and principles, the final effect obtained is better than the main effect of the Japanese speech recognition system of the latent Markov model and the long–short memory network, thus promoting its development.

Japanese Large-Vocabulary Continuous Speech Recognition System Based on Microsoft Whisper

Wavoice: A mmWave-assisted Noise-resistant Speech Recognition SystemJust Accepted

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System.

Wavoice: an Mmwave-Assisted Noise-Resistant Speech Recognition System

Wavoice: A Noise-resistant Multi-modal Speech Recognition System Fusing mmWave and Audio Signals

Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation

Exploring Native and Non-Native English Child Speech Recognition With Whisper

mmWave-Whisper: Phone Call Eavesdropping and Transcription Using Millimeter-Wave Radar

A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

A Study on Incorporating Whisper for Robust Speech Assessment

Mandarin Continuous Digit Speech Recognition System

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

Mai Ho'omāuna i ka 'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian

Leveraging Self-Supervised Models for Automatic Whispered Speech Recognition

Sound signal analysis in Japanese speech recognition based on deep learning algorithm

MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR

Speaker-Independent English Consonant and Japanese Word Recognition by a Stochastic Dynamic Time Warping Method

Efficient Embedded Speech Recognition for Very Large Vocabulary Mandarin Car-Navigation Systems