Abstract:Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal. Our goal is to make a step towards bridging phonology and speech processing and to contribute to the program of Laboratory Phonology.We show three application examples for laboratory phonology: compositional phonological speech modelling, a comparison of phonological systems and an experimental phonological parametric text-to-speech (TTS) system. The featural representations of the following three phonological systems are considered in this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English (SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded speech, we conclude that the latter achieves slightly better results than the former. However, GP – the most compact phonological speech representation – performs comparably to the systems with a higher number of phonological features. The parametric TTS based on phonological speech representation, and trained from an unlabelled audiobook in an unsupervised manner, achieves intelligibility of 85% of the state-of-the-art parametric speech synthesis.We envision that the presented approach paves the way for researchers in both fields to form meaningful hypotheses that are explicitly testable using the concepts developed and exemplified in this paper. On the one hand, laboratory phonologists might test the applied concepts of their theoretical models, and on the other hand, the speech processing community may utilize the concepts developed for the theoretical phonological models for improvements of the current state-of-the-art applications.

Impact of Irregular Pronunciation on Phonetic Segmentation of Nijmegen Corpus of Casual Czech

Prak: An automatic phonetic alignment tool for Czech

Automatic Pitch-Synchronous Phonetic Segmentation with Context-Independent HMMs

A morphologically annotated longitudinal corpus of spoken Czech child–adult interactions

Experiments of ASR-based mispronunciation detection for children and adult English learners

Pronunciation recognition of English phonemes /\textipa{@}/, /æ/, /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients

Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method

Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language

Recurrent Neural Network Based Speaker Change Detection from Text Transcription Applied in Telephone Speaker Diarization System

Adaptive Frequency Cepstral Coefficients for Word Mispronunciation Detection

Towards spoken dialect identification of Irish

Speech vocoding for laboratory phonology

Polish Read Speech Corpus for Speech Tools and Services

Upper and Lower Bounds for the Multiplexing of Multiclass Markovian on/off Sources

Current Issues in Pronunciation Teaching to Non-Native Learners of English

Phonetic Segmentation Using Knowledge from Visual and Perceptual Domain

Phonetic Segmentation of the UCLA Phonetics Lab Archive

Reducing pronunciation lexicon confusion and using more data without phonetic transcription for pronunciation modeling

Difference between Written and Spoken Czech: The Case of Verbal Nouns Denoting an Action

Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes