An Energy-Efficient Binarized Neural Network Using Analog-Intensive Feature Extraction for Keyword and Speaker Verification Wakeup.

Ning Pu,Kang Zhao,Syed Muhammad Abubakar,Yue Yin,Hanjun Jiang,Xiaofeng Yang,Zhihua Wang
DOI: https://doi.org/10.1109/iscas51556.2021.9401324
2021-01-01
Abstract:An analog-intensive voice feature extraction method based on time-domain filtering is proposed in this paper, which solves the computational complexity problem in Mel Frequency Cepstral Coefficient (MFCC) extraction. Compared with MFCCs, the estimated power consumption of the proposed feature extraction is reduced by approximately 4×. Besides, a binarized neural network model called DSXNOR-Net with low parameters is proposed together with the generalized end-to- end (GE2E) loss function for speaker verification. The feature extraction method and DSXNOR-Net are verified by Python. The proposed architecture achieves an optimal recognition accuracy of 98.3% on an English spoken digits corpus for keyword spotting (KWS) and 94.4% on TIMIT corpus for speaker verification (SV). The total estimated power consumption is reduced by approximately 2X and 4X for SV and KWS compared to the state-of-the-art results.
What problem does this paper attempt to address?