Energy-Friendly Keyword Spotting System Using Add-Based Convolution.

Hang Zhou,Wenchao Hu,Yu Ting Yeung,Xiao Chen
DOI: https://doi.org/10.21437/Interspeech.2021-458
2021-01-01
Abstract:Wake-up keyword of a keyword spotting (KWS) system represents brand name of a smart device. Performance of KWS is also crucial for modern speech based human-device interaction. An on-device KWS with both high accuracy and low power consumption is desired. We propose a KWS with add-based convolution layers, namely Add TC-ResNet. Add-based convolution paves a new way to reduce power consumption of KWS system, as addition is more energy efficient than multiplication at hardware level. On Google Speech Commands dataset V2, Add TC-ResNet achieves an accuracy of 97.1%, with 99% of multiplication operations are replaced by addition operations. The result is competitive to a state-of-the-art fully multiplication-based TC-ResNet KWS. We also investigate knowledge distillation and a mixed addition-multiplication design for the proposed KWS, which leads to further performance improvement.
What problem does this paper attempt to address?