Abstract:In the realm of speaker identification, pitch frequency serves as a fundamental feature. However, this feature can be compromised when a speaker records his speech in a closed room, resulting in distorted signal features. This distortion not only reduces the effectiveness of speaker identification systems, but also opens the door for potential deception by hackers who exploit the reverberation effects in closed rooms. To address this concern, the correction of estimated pitch frequencies emerges as an essential step for the success of speaker identification systems. This paper presents a Hybrid Approach for Estimating Pitch Frequency (HAEPF) that integrates both the Zero Crossing Rate (ZCR) and Auto-Correlation Function (ACF) methods. Furthermore, the paper delves into the modeling of reverberant speech using comb filtering, shedding light on how multiple reflections impact the accuracy of pitch frequency estimation. Several simulation experiments were conducted to assess pitch frequency estimation for speech signals, both in the presence and absence of reverberation. The estimation errors were calculated for all three scenarios of reverberation (mild, moderate, and severe). The results clearly indicate that as the degree of reverberation, characterized by the comb filter order, increases, the pitch frequency estimation error also increases. The estimation accuracy of the proposed approach is calculated in terms of Pitch Frequency Estimation Error (PFEE), Gross Pitch Error (GPE) and Octave Error (OER) and is compared with those of several established pitch frequency estimation methods. The proposed approach exhibits a notable enhancement even in noisy environments, reducing PFEE by 43%, and achieving GPE and OER of less than 0.3 and 0.12, respectively, at a Signal-to-Noise Ratio (SNR) of 0 dB.

Pitch Synchronized Relative Phase with Peak Error Detection For Noise-robust Speaker Recognition

Pseudo-pitch-synchronized Phase Information Extraction and Its Application for Robust Speaker Recognition

Pitch envelope based frame level score reweighed algorithm for emotion robust speaker recognition.

Simplified Deformation Compensation for Emotional Speaker Recognition

Learning Virtual HD Model for Bi-model Emotional Speaker Recognition

Toward Pitch-Insensitive Speaker Verification Via Soundfield

Cost-Sensitive Learning for Emotion Robust Speaker Recognition

Combining Mfcc And Pitch To Enhance The Performance Of The Gender Recognition

Speech Enhancement Based On Analysis Synthesis Framework With Improved Pitch Estimation And Spectral Envelope Enhancement

Noise Estimation Using Mean Square Cross Prediction Error for Speech Enhancement

Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions

Robust Multipitch Estimation Of Piano Sounds Using Deep Spiking Neural Networks

Noise Robust Voice Activity Detection Using Joint Phase and Magnitude Based Feature Enhancement.

The predictive differential amplitude spectrum for robust speaker recognition in stationary noises

HAEPF: hybrid approach for estimating pitch frequency in the presence of reverberation

Mandarin Isolated Words Recognition Method Based on Pitch Contour

Noise-robustness of speaker verification based on the perceptual log area ratio

Using Subband Mel-spectrum Centroid and Gaussian Mixture Correlation for Robust Speaker Identification

A Pitch Period Detection Algorithm Using Time and Frequency Analyses

Speaker Verification Using Simple Temporal Features and Pitch Synchronous Cepstral Coefficients

A pitch-based rapid speech segmentation for speaker indexing