A Depthwise Separable Convolution Neural Network for Small-footprint Keyword Spotting Using Approximate MAC Unit and Streaming Convolution Reuse

Yicheng Lu,Weiwei Shan,Jiaming Xu
DOI: https://doi.org/10.1109/apccas47518.2019.8953096
2019-01-01
Abstract:In recent years, many applications of voice wake-up technology have entered people's lives and the key technology is Keyword Spotting (KWS). The keyword spotting system needs to detect the ambient voice and wait for a wake-up at any time, which requires low power consumption and high recognition accuracy. We mainly aim at reducing the power consumption of real-time keyword spotting systems in this paper. Based on Google's speech commands dataset (GSCD), a deep neural network model with Depthwise Separable Convolution (DS-Conv) is constructed and trained. We propose a kind of Approximate Multiply and Accumulate Unit (AP-MAC) and a data reuse method called Streaming Convolution Reuse (SCR) and prove that the neural network with AP-MACs saves 37.7% ~ 42.6% of computing power and achieves similar Word Error Rate (WER) compared to the same model using traditional MAC units in KWS task. Also, SCR allows the model to reuse convolution results for multiple audio frames and saves 94% of activations storage. By combining these two methods, the computing power and memory storage per audio frame of the baseline model are reduced by 98.5% ~ 98.7% and 94% respectively.
What problem does this paper attempt to address?