Abstract:Objective: Speech tests assess the ability of people with hearing loss to comprehend speech with a hearing aid or cochlear implant. The tests are usually at the word or sentence level. However, few tests analyze errors at the phoneme level. So, there is a need for an automated program to visualize in real time the accuracy of phonemes in these tests. Method: The program reads in stimulus-response pairs and obtains their phonemic representations from an open-source digital pronouncing dictionary. The stimulus phonemes are aligned with the response phonemes via a modification of the Levenshtein Minimum Edit Distance algorithm. Alignment is achieved via dynamic programming with modified costs based on phonological features for insertion, deletions and substitutions. The accuracy for each phoneme is based on the F1-score. Accuracy is visualized with respect to place and manner (consonants) or height (vowels). Confusion matrices for the phonemes are used in an information transfer analysis of ten phonological features. A histogram of the information transfer for the features over a frequency-like range is presented as a phonemegram. Results: The program was applied to two datasets. One consisted of test data at the sentence and word levels. Stimulus-response sentence pairs from six volunteers with different degrees of hearing loss and modes of amplification were analyzed. Four volunteers listened to sentences from a mobile auditory training app while two listened to sentences from a clinical speech test. Stimulus-response word pairs from three lists were also analyzed. The other dataset consisted of published stimulus-response pairs from experiments of 31 participants with cochlear implants listening to 400 Basic English Lexicon sentences via different talkers at four different SNR levels. In all cases, visualization was obtained in real time. Analysis of 12,400 actual and random pairs showed that the program was robust to the nature of the pairs. Conclusion: It is possible to automate the alignment of phonemes extracted from stimulus-response pairs from speech tests in real time. The alignment then makes it possible to visualize the accuracy of responses via phonological features in two ways. Such visualization of phoneme alignment and accuracy could aid clinicians and scientists.

Letter to Sound Rules for Accented Lexicon Compression

A High Accuracy Approach for Word-Phoneme Translation Using Neural Networks

Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models

LSTM Acoustic Models Learn to Align and Pronounce with Graphemes

Trimming Phonetic Alignments Improves the Inference of Sound Correspondence Patterns from Multilingual Wordlists

Letter-to-sound Conversion Using Coupled Hidden Markov Models for Lexicon Compression

Reducing pronunciation lexicon confusion and using more data without phonetic transcription for pronunciation modeling

PhonologyBench: Evaluating Phonological Skills of Large Language Models

Training LLMs over Neurally Compressed Text

Non-native English lexicon creation for bilingual speech synthesis

Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition

Pronunciation recognition of English phonemes /\textipa{@}/, /æ/, /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients

MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

A probabilistic approach to pronunciation by analogy

A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR

Visualization of Speech Perception Analysis via Phoneme Alignment: A Pilot Study

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Multilingual context-based pronunciation learning for Text-to-Speech

A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency