Abstract:BACKGROUND AND OBJECTIVES: The characterization of the vocal tract geometry during speech interests various research topics, including speech production modeling, motor control analysis, and speech therapy design. Real-time MRI is a reliable and non-invasive tool for this purpose. In most cases, it is necessary to know the contours of the individual articulators from the glottis to the lips. Several techniques have been proposed for segmenting vocal tract articulators, but most are limited to specific applications. Moreover, they often do not provide individualized contours for all soft-tissue articulators in a multi-speaker configuration.METHODS: A Mask R-CNN network was trained to detect and segment the vocal tract articulator contours in two real-time MRI (RT-MRI) datasets with speech recordings of multiple speakers. Two post-processing algorithms were then proposed to convert the network's outputs into geometrical curves. Nine articulators were considered: the two lips, tongue, soft palate, pharynx, arytenoid cartilage, epiglottis, thyroid cartilage, and vocal folds. A leave-one-out cross-validation protocol was used to evaluate inter-speaker generalization. The evaluation metrics were the point-to-closest-point distance and the Jaccard index (for articulators annotated as closed contours).RESULTS: The proposed method accurately segmented the vocal tract articulators, with an average root mean square point-to-closest-point distance of less than 2.2mm for all the articulators in the leave-one-out cross-validation setting. The minimum P2CP<sub>RMS</sub> was 0.91mm for the upper lip, and the maximum was 2.18mm for the tongue. The Jaccard indices for the thyroid cartilage and vocal folds were 0.60 and 0.61, respectively. Additionally, the method adapted to a new subject with only ten annotated samples.CONCLUSIONS: Our research introduced a method for individually segmenting nine non-rigid vocal tract articulators in real-time MRI movies. The software is openly available as an installable package to the speech community. It is designed to develop speech applications and clinical and non-clinical research in fields that require vocal tract geometry, such as speech, singing, and human beatboxing.

Estimation of vocal tract shapes from speech sounds with a physiological articulatory model

Estimation Of Vocal Tract Area Function For Mandarin Vowel Sequences Using Mri

An articulatory model of standard Chinese using MRI and X-ray movie

A Study of Mandarin Chinese Using X-Ray and MRI

Radius Vector-Driven 3-D Mandarin Vocal Tract Model

Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging

A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese.

Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy

A statistical shape space model of the palate surface trained on 3D MRI scans of the vocal tract

A 3D biomechanical vocal tract model to study speech production control: How to take into account the gravity?

A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract

Laryngeal Muscular Control of Vocal Fold Posturing: Numerical Modeling and Experimental Validation.

A 3D Geometry Model of Vocal Tract Based on Smart Internet of Things

Effect of Medial Surface Shape on Voice Production in a MRI-based Three-Dimensional Phonation Model

Vocal Tract Area Estimation by Gradient Descent

Numerical Investigation of the Influence of Thyroarytenoid and Cricothyroid Muscle Contraction on the Geometry and Biomechanical Properties of the Vocal Folds

A Hybrid Method for Acoustic Analysis of the Vocal Tract During Vowel Production.

Speech-Based Parameter Estimation of an Asymmetric Vocal Fold Oscillation Model and Its Application in Discriminating Vocal Fold Pathologies

Voice Production in a MRI-based Subject-Specific Vocal Fold Model with Parametrically Controlled Medial Surface Shape

Unsupervised Inference of Physiologically Meaningful Articulatory Trajectories with VocalTractLab

A new method of reconstructing the human laryngeal architecture using micro-MRI.