Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate

Ilja Baumann,Dominik Wagner,Franziska Braun,Sebastian P. Bayerl,Elmar Nöth,Korbinian Riedhammer,Tobias Bocklet
2023-08-01
Abstract:Recent findings show that pre-trained wav2vec 2.0 models are reliable feature extractors for various speaker characteristics classification tasks. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be used as features for binary classification to distinguish between children with Cleft Lip and Palate (CLP) and a healthy control group. The results indicate that the distinction between CLP and healthy voices, especially with latent representations from the lower and middle encoder layers, reaches an accuracy of 100%. We test the classifier to find influencing factors for classification using unseen out-of-domain healthy and pathologic corpora with varying characteristics: age, spoken content, and acoustic conditions. Cross-pathology and cross-healthy tests reveal that the trained classifiers are unreliable if there is a mismatch between training and out-of-domain test data in, e.g., age, spoken content, or acoustic conditions.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?