A Hardware-efficient Accelerator for Encoding Stage of Text-to-speech Synthesis

Riyong Zheng,Chenghao Wang,Jun Han,Xiaoyang Zeng
DOI: https://doi.org/10.1109/ASICON47005.2019.8983681
2019-01-01
Abstract:Text-to-speech synthesis is a promising human- computer interaction technology. Google launched the TTS model Tacotron, which can directly convert raw text to speech. The encoder module is one of the most important components of Tacotron. It extracts context features in the text and generate time series. The encoder module contains Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). There are few hardware accelerators to support these hybrid algorithm and the parallel architecture calculations. To this end, we designed a hardware-efficient accelerator to accomplish the complex computing tasks. We quantify the network model to reduce hardware overhead while using parallel hardware structures to increase operating speed. The encoder accelerator can process 2903 bits data per second with 3.627W. Compared to Titan X, the energy efficiency ratio is 71 times higher.
What problem does this paper attempt to address?