Abstract:This technical report describes the results of a collaboration between the NLP research group at the University of Tartu and the Institute of Estonian Language on improving neural speech synthesis for Estonian. The report (written in Estonian) describes the project results, the summary of which is: (1) Speech synthesis data from 6 speakers for a total of 92.4 hours is collected and openly released (CC-BY-4.0). Data available at <a class="link-external link-https" href="https://konekorpus.tartunlp.ai" rel="external noopener nofollow">this https URL</a> and <a class="link-external link-https" href="https://www.eki.ee/litsents/" rel="external noopener nofollow">this https URL</a>. (2) software and models for neural speech synthesis is released open-source (MIT license). Available at <a class="link-external link-https" href="https://koodivaramu.eesti.ee/tartunlp/text-to-speech" rel="external noopener nofollow">this https URL</a> . (3) We ran evaluations of the new models and compared them to other existing solutions (HMM-based HTS models from EKI, <a class="link-external link-http" href="http://www.eki.ee/heli/" rel="external noopener nofollow">this http URL</a>, and Google's speech synthesis for Estonian, accessed via <a class="link-external link-https" href="https://translate.google.com" rel="external noopener nofollow">this https URL</a>). Evaluation includes voice acceptability MOS scores for sentence-level and longer excerpts, detailed error analysis and evaluation of the pre-processing module.

Advanced Rich Transcription System for Estonian Speech

Open source platform for Estonian speech transcription

Cambridge University Transcription Systems for the Multi-Genre Broadcast Challenge.

Neural Speech Synthesis for Estonian

Manual Post-editing of Automatically Transcribed Speeches from the Icelandic Parliament - Althingi

DARTS: Dialectal Arabic Transcription System

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription

Modelling of a Speech-to-Text Recognition System for Air Traffic Control and NATO Air Command

Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

TST: Time-Sparse Transducer for Automatic Speech Recognition

Attention-based Transducer for Online Speech Recognition

FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator

Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

Multilingual Multiaccented Multispeaker TTS with RADTTS

Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge

Extending RNN-T-based speech recognition systems with emotion and language classification