Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging
Vinicius Ribeiro,Karyna Isaieva,Justine Leclere,Jacques Felblinger,Pierre-André Vuissoz,Yves Laprie
DOI: https://doi.org/10.1016/j.cmpb.2023.107907
IF: 6.1
2024-01-01
Computer Methods and Programs in Biomedicine
Abstract:BACKGROUND AND OBJECTIVES: The characterization of the vocal tract geometry during speech interests various research topics, including speech production modeling, motor control analysis, and speech therapy design. Real-time MRI is a reliable and non-invasive tool for this purpose. In most cases, it is necessary to know the contours of the individual articulators from the glottis to the lips. Several techniques have been proposed for segmenting vocal tract articulators, but most are limited to specific applications. Moreover, they often do not provide individualized contours for all soft-tissue articulators in a multi-speaker configuration.METHODS: A Mask R-CNN network was trained to detect and segment the vocal tract articulator contours in two real-time MRI (RT-MRI) datasets with speech recordings of multiple speakers. Two post-processing algorithms were then proposed to convert the network's outputs into geometrical curves. Nine articulators were considered: the two lips, tongue, soft palate, pharynx, arytenoid cartilage, epiglottis, thyroid cartilage, and vocal folds. A leave-one-out cross-validation protocol was used to evaluate inter-speaker generalization. The evaluation metrics were the point-to-closest-point distance and the Jaccard index (for articulators annotated as closed contours).RESULTS: The proposed method accurately segmented the vocal tract articulators, with an average root mean square point-to-closest-point distance of less than 2.2mm for all the articulators in the leave-one-out cross-validation setting. The minimum P2CP<sub>RMS</sub> was 0.91mm for the upper lip, and the maximum was 2.18mm for the tongue. The Jaccard indices for the thyroid cartilage and vocal folds were 0.60 and 0.61, respectively. Additionally, the method adapted to a new subject with only ten annotated samples.CONCLUSIONS: Our research introduced a method for individually segmenting nine non-rigid vocal tract articulators in real-time MRI movies. The software is openly available as an installable package to the speech community. It is designed to develop speech applications and clinical and non-clinical research in fields that require vocal tract geometry, such as speech, singing, and human beatboxing.
engineering, biomedical,computer science, interdisciplinary applications,medical informatics, theory & methods