Detection of fricative and vowels in speech signals
Avinash Kumar,Syed Shahnawazuddin
DOI: https://doi.org/10.1007/s11042-024-19623-9
IF: 2.577
2024-06-19
Multimedia Tools and Applications
Abstract:A novel approach for effectively detecting fricatives and vowels within a speech segment is presented in this paper. Vowels and fricatives are produced by differences in the place of articulation. In addition to that, vowels are longer in duration and near periodic. In order to effectively capture the spectral information relevant to fricatives and vowels, we have exploited Mel-frequency cepstral coefficients (MFCC) and inverse-Mel-frequency cepstral coefficients (IMFCC) for front-end speech parameterization. The MFCC features are designed to down-sample the spectral information in the high-frequency region. The IMFCC features, on the other hand, effectively preserve those critical spectral information. Consequently, The simultaneous use of those two features helps in discriminating vowels from fricatives. To automatically detect fricatives and vowels, two deep- neural-network-based phoneme classifiers are developed using MFCC and IMFCC features, respectively. Any given speech sample is then force-aligned against the trained acoustic models to determine the frame-level time alignments. The fricatives and vowels as well as their corresponding boundaries are then determined using the generated time alignments.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering