Abstract:This paper presents a French text-to-speech synthesis system for the Blizzard Challenge 2023. The challenge consists of two tasks: generating high-quality speech from female speakers and generating speech that closely resembles specific individuals. Regarding the competition data, we conducted a screening process to remove missing or erroneous text data. We organized all symbols except for phonemes and eliminated symbols that had no pronunciation or zero duration. Additionally, we added word boundary and start/end symbols to the text, which we have found to improve speech quality based on our previous experience. For the Spoke task, we performed data augmentation according to the competition rules. We used an open-source G2P model to transcribe the French texts into phonemes. As the G2P model uses the International Phonetic Alphabet (IPA), we applied the same transcription process to the provided competition data for standardization. However, due to compiler limitations in recognizing special symbols from the IPA chart, we followed the rules to convert all phonemes into the phonetic scheme used in the competition data. Finally, we resampled all competition audio to a uniform sampling rate of 16 kHz. We employed a VITS-based acoustic model with the hifigan vocoder. For the Spoke task, we trained a multi-speaker model and incorporated speaker information into the duration predictor, vocoder, and flow layers of the model. The evaluation results of our system showed a quality MOS score of 3.6 for the Hub task and 3.4 for the Spoke task, placing our system at an average level among all participating teams.

Text Split Upon Space Silence Tag Insertion Letter To Unicode Transformation AssameseTamil Gujarati Pause after SWord Pause at the End Pause in punctuation Label Generation Context information For Tree-Based Clustering Letter Sets Text Tegulu Rajasthan

The USTC System for Blizzard Challenge 2009

The USTC System for Blizzard Challenge 2008

The USTC System for Blizzard Challenge 2010

The USTC and iFlytek Speech Synthesis Systems for Blizzard Challenge 2007

The NTU-AISG Text-to-speech System for Blizzard Challenge 2020

USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method

Analysis Syntactic Parsing Speech Synthesis Text Text Analysis Syntactic Parsing Speech Database Training Synthesis Training of HMMs Speech Database Data Selection Feature Extraction HMMs

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

BLSTM Guided Unit Selection Synthesis System for Blizzard Challenge 2016

DNN-based Speech Synthesis for Indian Languages from ASCII text

The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020

Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2009

MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023

The NLPR Speech Synthesis Entry for Blizzard Challenge 2020

The FruitShell French synthesis system at the Blizzard 2023 Challenge

The Sogou Speech Synthesis System for Blizzard Challenge 2018

The Iflytek System for Blizzard Machine Learning Challenge 2017-ES1

Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge