Abstract:Speech recognition has progressed tremendously in the area of artificial intelligence (AI). However, the performance of the real-time offline Chinese speech recognition neural network accelerator for edge AI needs to be improved. This paper proposes a configurable convolutional neural network accelerator based on a lightweight speech recognition model, which can dramatically reduce hardware resource consumption while guaranteeing an acceptable error rate. For convolutional layers, the weights are binarized to reduce the number of model parameters and improve computational and storage efficiency. A multichannel shared computation (MCSC) architecture is proposed to maximize the reuse of weight and feature map data. The binary weight-sharing processing engine (PE) is designed to avoid limiting the number of multipliers. A custom instruction set is established according to the variable length of voice input to configure parameters for adapting to different network structures. Finally, the ping-pong storage method is used when the feature map is an input. We implemented this accelerator on Xilinx ZYNQ XC7Z035 under the working frequency of 150 MHz. The processing time for 2.24 s and 8 s of speech was 69.8 ms and 189.51 ms, respectively, and the convolution performance reached 35.66 GOPS/W. Compared with other computing platforms, accelerators perform better in terms of energy efficiency, power consumption and hardware resource consumption.

Compression of speech database by feature separation and pattern clustering using STRAIGHT.

Optimization of Pitch Preprocessing in TETRA Speech Encoder

A Chinese Voice Morphing Method Based on STRAIGHT

Distributed Steganalysis of Compressed Speech.

An Improved Spectral And Prosodic Transformation Method In Straight-Based Voice Conversion

A novel voice conversion system based on codebook mapping with phoneme-tied weighting.

Learning Prosodic Patterns for Mandarin Speech Synthesis

Mechanical vibration signal compression based on speech codecs for intelligent manufacturing

Multi-domain speech compression based on wavelet packet transform

Steganalysis of Analysis-By-Synthesis Speech Exploiting Pulse-Position Distribution Characteristics

Phoneme Dependent Speaker Embedding And Model Factorization For Multi-Speaker Speech Synthesis And Adaptation

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

Localized Mandarin Speech Synthesis Services For Enterprise Scenarios

Low-Latency Deep Clustering For Speech Separation

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

Combining Speech Enhancement and Discriminative Feature Extraction for Robust Speaker Recognition

Speech separation method and system

Mandarin Stress Analysis And Prediction For Speech Synthesis

Efficient Binary Weight Convolutional Network Accelerator for Speech Recognition

General Steganalysis Method of Compressed Speech under Different Standards