Abstract:Precise detection of speech endpoints is an important factor which affects the performance of the systems where speech utterances need to be extracted from the speech signal such as Automatic Speech Recognition (ASR) system. Existing endpoint detection (EPD) methods mostly uses Short-Term Energy (STE), Zero-Crossing Rate (ZCR) based approaches and their variants. But STE and ZCR based EPD algorithms often fail in the presence of Non-speech Sound Artifacts (NSAs) produced by the speakers. Algorithms based on pattern recognition and classification techniques are also proposed but require labeled data for training. A new algorithm termed as Wavelet Convolution based Speech Endpoint Detection (WCSEPD) is proposed in this article to extract speech endpoints. WCSEPD decomposes the speech signal into high-frequency and low-frequency components using wavelet convolution and computes entropy based thresholds for the two frequency components. The low-frequency thresholds are used to extract voiced speech segments, whereas the high-frequency thresholds are used to extract the unvoiced speech segments by filtering out the NSAs. WCSEPD does not require any labeled data for training and can automatically extract speech segments. Experiment results show that the proposed algorithm precisely extracts speech endpoints in the presence of NSAs.

Speech endpoint detection based on frequency domain and time domain analyses

A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave.

Research on real-time detection technology of Chinese voiced speech pitch

Implementation of Abnormal Sound Detection in Intelligent Surveillance Front-end System

A Pitch Period Detection Algorithm Using Time and Frequency Analyses

Effective Speech Endpoint Detection Algorithm For Voiceprint Recognition

Speech Endpoint Identification Based on Empirical Mode Decomposition

A Recursive Calculating Algorithm for Higher-Order Cumulants over Sliding Window and Its Application in Speech Endpoint Detection

Endpoint Detect Method of Embedded Speech Recognition System

Detection of fricative and vowels in speech signals

Endpoint detection and pitch determination method based on a probability model

Detection of Time Varying Pitch in Tonal Languages: an Approach Based on Ensemble Empirical Mode Decomposition

Precise Detection of Speech Endpoints Dynamically: A Wavelet Convolution based approach

Detection of Dynamic Structures of Speech Fundamental Frequency in Tonal Languages

The Research on Pitch Extraction Method for Voice Activity Detection Based on Periodic Decomposition

A Power Spectrum Reprocessing Algorithm for Pitch Detection of Speech

An Improved Speech Detection Algorithm Based on Time-domain Parameter

Detectionoftimevaryingpitchintonallanguages: Anapproachbasedonensembleempirical Modedecomposition ∗

Endpoint detection of speech signal based on empirical mode decomposition and Teager kurtosis

Voice Activity Detection Based on Wavelet Multiresolution Spectrum

Detection Of Spectral Transition For Speech Perception Based On Time-Frequency Analysis