Abstract:Considering personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English) names, language-independent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a promising option to solve the problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications with limited storage space and small vocabulary, such as voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. Even though we have successfully developed two fast and accurate DTW variations for clean speech data, speech recognition for adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy environment and bad recording conditions such as too high or low volume, we introduce a novel one-against-all weighted DTW (OAWDTW). This method defines a one-against-all index (OAI) for each time frame of training data and applies the OAIs to the core DTW process. Given two speech signals, OAWDTW tunes their final alignment score by using OAI in the DTW process. Our method achieves better accuracies than DTW and merge-weighted DTW (MWDTW), as 6.97% relative reduction of error rate (RRER) compared with DTW and 15.91% RRER compared with MWDTW are observed in our extensive experiments on one representative SD dataset of four speakers' recordings. To the best of our knowledge, OAWDTW approach is the first weighted DTW specially designed for speech data in adverse conditions.

A Synchronized Pruning Composition Algorithm of Weighted Finite State Transducers for Large Vocabulary Speech Recognition

Low Space-Complexity Composition Algorithm for Weighted Finite-State Transducers

Query-based Composition for Large-Scale Language Model in LVCSR

3-Way Composition of Weighted Finite-State Transducers

A Study of Large Vocabulary Speech Recognition Decoding Using Finite-State Graphs

An Exact Word Lattice Generation Method in the Weighted Finite-State Transducer Framework

Linguistic Search Optimization for Deep Learning Based LVCSR

FOLSOM: A FAST AND MEMORY-EFFICIENT PHRASE-BASED APPROACH TO STATISTICAL MACHINE TRANSLATION

A Synthesis Instance Pruning Approach Based on Virtual Non-Uniform Replacements

Integrating Multi-Level Linguistic Knowledge with a Unified Framework for Mandarin Speech Recognition

An Efficient Layer-Wised Beam Pruning Algorithm for Large Vocabulary Continuous Speech Recognition System

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

A Joint Segmenting And Labeling Approach For Chinese Lexical Analysis

Constrained Phrase-based Translation Using Weighted Finite State Transducer.

TST: Time-Sparse Transducer for Automatic Speech Recognition

Task-Agnostic Structured Pruning of Speech Representation Models

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

Context features based pre-selection and weight prediction in concatenation speech synthesis system

Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition

One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions

Stream Weight Training Based on MCE for Audio-Visual LVCSR