Abstract:At present, Text-to-speech (TTS) systems that are trained with high-quality transcribed speech data using end-to-end neural models can generate speech that is intelligible, natural, and closely resembles human speech. These models are trained with relatively large single-speaker professionally recorded audio, typically extracted from audiobooks. Meanwhile, due to the scarcity of freely available speech corpora of this kind, a larger gap exists in Arabic TTS research and development. Most of the existing freely available Arabic speech corpora are not suitable for TTS training as they contain multi-speaker casual speech with variations in recording conditions and quality, whereas the corpus curated for speech synthesis are generally small in size and not suitable for training state-of-the-art end-to-end models. In a move towards filling this gap in resources, we present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz. In this paper, we describe the process of corpus creation and provide details of corpus statistics and a comparison with existing resources. Furthermore, we develop two TTS systems based on Grad-TTS and Glow-TTS and illustrate the performance of the resulting systems via subjective and objective evaluations. The corpus will be made publicly available at <a class="link-external link-http" href="http://www.clartts.com" rel="external noopener nofollow">this http URL</a> for research purposes, along with the baseline TTS systems demo.

Librispeech: An ASR corpus based on public domain audio books

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

MLS: A Large-Scale Multilingual Dataset for Speech Research

LibriS2S: A German-English Speech-to-Speech Translation Corpus

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation

The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage

Common Voice: A Massively-Multilingual Speech Corpus

LibriVoxDeEn: A Corpus for German-to-English Speech Translation and German Speech Recognition

A Speech Test Set of Practice Business Presentations with Additional Relevant Texts

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

An Anechoic, High-Fidelity, Multidirectional Speech Corpus

LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models

Hearing voices at the National Library -- a speech corpus and acoustic model for the Swedish language

MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework.

ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus

HarperValleyBank: A Domain-Specific Spoken Dialog Corpus

TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline

ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control Environment

Updated Corpora and Benchmarks for Long-Form Speech Recognition