Abstract:BACKGROUND AND OBJECTIVES: The characterization of the vocal tract geometry during speech interests various research topics, including speech production modeling, motor control analysis, and speech therapy design. Real-time MRI is a reliable and non-invasive tool for this purpose. In most cases, it is necessary to know the contours of the individual articulators from the glottis to the lips. Several techniques have been proposed for segmenting vocal tract articulators, but most are limited to specific applications. Moreover, they often do not provide individualized contours for all soft-tissue articulators in a multi-speaker configuration.METHODS: A Mask R-CNN network was trained to detect and segment the vocal tract articulator contours in two real-time MRI (RT-MRI) datasets with speech recordings of multiple speakers. Two post-processing algorithms were then proposed to convert the network's outputs into geometrical curves. Nine articulators were considered: the two lips, tongue, soft palate, pharynx, arytenoid cartilage, epiglottis, thyroid cartilage, and vocal folds. A leave-one-out cross-validation protocol was used to evaluate inter-speaker generalization. The evaluation metrics were the point-to-closest-point distance and the Jaccard index (for articulators annotated as closed contours).RESULTS: The proposed method accurately segmented the vocal tract articulators, with an average root mean square point-to-closest-point distance of less than 2.2mm for all the articulators in the leave-one-out cross-validation setting. The minimum P2CP<sub>RMS</sub> was 0.91mm for the upper lip, and the maximum was 2.18mm for the tongue. The Jaccard indices for the thyroid cartilage and vocal folds were 0.60 and 0.61, respectively. Additionally, the method adapted to a new subject with only ten annotated samples.CONCLUSIONS: Our research introduced a method for individually segmenting nine non-rigid vocal tract articulators in real-time MRI movies. The software is openly available as an installable package to the speech community. It is designed to develop speech applications and clinical and non-clinical research in fields that require vocal tract geometry, such as speech, singing, and human beatboxing.

Estimate Articulatory Mri Series From Acoustic Signal Using Deep Architecture

Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

A Deep Recurrent Approach for Acoustic-to-articulatory Inversion.

Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data

Silent Speech Decoding Using Spectrogram Features Based on Neuromuscular Activities

Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM.

Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

Deep Speech Synthesis from MRI-Based Articulatory Representations

Evaluation Of Linear Regression For Speaker Adaptation In Hmm-Based Articulatory Movements Estimation

Speaker-Independent Acoustic-to-Articulatory Speech Inversion

Acoustic to Articulatory Mapping with Deep Neural Network

Articulatory-to-acoustic Conversion Using BLSTM-RNNs with Augmented Input Representation.

DNN-based Acoustic-to-Articulatory Inversion using Ultrasound Tongue Imaging

Decoding Vocal Articulations from Acoustic Latent Representations

On the Evaluation of Inversion Mapping Performance in the Acoustic Domain

Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy

Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks

GeO2-SiO2-chitosan-medium-coated hollow optical fiber for cell immobilization.

Articulatory-WaveNet: Autoregressive Model For Acoustic-to-Articulatory Inversion

Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI

Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging