Abstract:Low-resource text-to-speech synthesis is a very promising research direction. Mongolian is the official language of the Inner Mongolia Autonomous Region and is spoken by more than 10 million people worldwide. Mongolian, as a representative low-resource language, has a relative lack of open-source datasets for its TTS. Therefore, we make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for related researchers. In this work, we invited three Mongolian announcers to record topic-rich speeches. Each announcer recorded 10 h of Mongolian speech, and the whole dataset was 30 h in total. In addition, we built two baseline systems based on state-of-the-art neural architectures, including a multi-speaker Fastspeech 2 model with HiFi-GAN vocoder and a full end-to-end VITS model for multi-speakers. On the system of FastSpeech2+HiFi-GAN, the three speakers scored 4.0 or higher on both naturalness evaluation and speaker similarity. In addition, the three speakers achieved scores of 4.5 or higher on the VITS model for naturalness evaluation and speaker similarity scores. The experimental results show that the published MnTTS2 dataset can be used to build robust Mongolian multi-speaker TTS models.

Construction of A Mongolian Telephone Speech Corpus

IMUT-MC: a Speech Corpus for Mongolian Speech Recognition

Design and research of Tibetan spoken speech corpus

Construction and Analysis of Tibetan Khampa Dialect Corpus for Speech Synthesis

Theoretical and Methodological Problems in the Construction of Mongolian Speech Evaluation System

Design of Tibetan Continuous Speech Corpus Based on Triphone

Multilingual conversational telephony speech corpus creation for real world speaker diarization and recognition

A Mongolian Speech Recognition System Based On Hmm

Assembling Chinese-Mongolian Speech Corpus via Crowdsourcing.

Design and implementation of Tibetan continuous speech corpus

Comparative Study for Multi-Speaker Mongolian TTS with a New Corpus

State Clustering of Mongolian Acoustic Models:Design of the Question Set

USTC95-a Putonghua Corpus

Language resource construction for Mongolian.

Utilizing Crowdsourcing for the Construction of Chinese-Mongolian Speech Corpus with Evaluation Mechanism

MNASR: A Free Speech Corpus for Mongolian Speech Recognition and Accompanied Baselines.

A METHOD TO CONSTRUCT AN ADAPTIVE MONGOLIAN SPEECH ACOUSTIC MODEL

ROBUSTNESS OF SPEECH RECOGNITION AND CONSTRUCTION OF A SPEECH CORPUS

End-to-End Mongolian Text-to-Speech System

Tibetan Vowel Analysis with A Multi-Modal Mandarin-Tibetan Speech Corpus

Mongolian Speech Synthesis Corpus Design and the Establishment of Rhythm Marked