Phonetic Segmentation Using Knowledge from Visual and Perceptual Domain

Bhavik Vachhani,Chitralekha Bhat,Sunil Kopparapu
DOI: https://doi.org/10.1007/978-3-319-64206-2_44
2017-01-01
Abstract:Accurate and automatic phonetic segmentation is crucial for several speech based applications such as phone level articulation analysis and error detection, speech synthesis, annotation, speech recognition and emotion recognition. In this paper we examine the effectiveness of using visual features obtained by processing the image spectrogram of a speech utterance, as applied to phonetic segmentation. Further, we propose a mechanism to combine the knowledge from visual and perceptual domains for automatic phonetic segmentation. This process can be considered analogous to manual phonetic segmentation. The technique was evaluated on TIMIT American English Corpus. Experimental results show significant improvements in phonetic segmentation, especially for lower tolerances of 5, 10 and 15 ms, with an absolute improvement of 8.29% for TIMIT database for a 10 ms tolerance is observed.
What problem does this paper attempt to address?