Abstract:Feature extraction is an essential part of automatic speech recognition (ASR) to compress raw speech data and enhance features, where conventional implementation methods based on the digital domain have encountered energy consumption and processing speed bottlenecks. Thus, we propose a Mixed-Signal Processing (MSP) architecture to efficiently extract Mel-Frequency Cepstrum Coefficients (MFCC) features. We design MSP-MFCC to pre-process speech signals in the analog domain, which significantly reduces the cost of the analog-to-digital converter (ADC), as well as the computational complexity of the digital backend. Moreover, MSP-MFCC eliminates the time-consuming Fourier transform in the conventional digital realization by improving processing flow. We fabricated the analog part based on 180nm CMOS mixed-signal technology, then measured the chip. The measured results show the energy consumption of MSP-MFCC is 0.72 mu J/frame, and the processing speed is up to 45.79 mu s/frame. MSP-MFCC achieves 95% energy saving and about 6.4 x speedup than state of the art. Further, by using the features extracted by MSP-MFCC, speech recognition simulation reaches the accuracy of 98.2%, which also keeps the leading performance to its current counterparts. The proposed MFCC extractor is competitive for integration in the ultra-low-power always-on wearable speech recognition applications.

Design of an Ultra-Low Power MFCC Feature Extraction Circuit with Embedded Speech Activity Detector

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Energy-efficient MFCC Extraction Architecture in Mixed-Signal Domain for Automatic Speech Recognition

A $2.81\mu \mathrm{W}$, Energy Efficient MFCC Feature Extractor for Keyword-Spotting in 65nm CMOS.

AAD-KWS: a Sub- $\mu\mathrm{w}$ Keyword Spotting Chip with a Zero-Cost, Acoustic Activity Detector from a 170nw MFCC Feature Extractor in 28nm CMOS

A $2.81\mu \mathrm{w}$, Energy Efficient MFCC Feature Extractor for Keyword-Spotting in 65nm CMOS

Precision Adaptive MFCC Based on R2SDF-FFT and Approximate Computing for Low-Power Speech Keywords Recognition

Optimization and evaluation of energy-efficient mixed-signal MFCC feature extraction architecture

A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave.

A CMOS Low Power K-band FMCW Radar Transceiver Front-End for AIOT Application

AAD-KWS: A Sub-μ W Keyword Spotting Chip with an Acoustic Activity Detector Embedded in MFCC and a Tunable Detection Window in 28-Nm CMOS

14.1 A 510nw 0.41V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depthwise Separable Convolutional Neural Network in 28nm CMOS

Low-Power Keyword Recognition Feature Extraction Circuit Based on SRMFCC and Shared Multiplier for High Noise Background

A 608nW Near-Microphone Keyword-Spotting Chip Using Real-Point Serial FFT-Based MFCC and Temporal Depthwise Separable CNN in 28nm CMOS

A Low-Power Keyword Spotting System with High-Order Passive Switched-Capacitor Bandpass Filters for Analog-MFCC Feature Extraction

A 0.61-Μw Fully Integrated Keyword-Spotting ASIC with Real-Point Serial FFT-Based MFCC and Temporal Depthwise Separable CNN

VoAD: A Sub-μW Multiscene Voice Activity Detector Deploying Analog-Frontend Digital-Backend Circuits

A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS

Sound Event Detection on A Single MCU: Lightweight, Low-power and Portable

A Fast Algorithm for Extracting Speech Feature Parameter MFCC on Microcontroller

Nanowatt Acoustic Inference Sensing Exploiting Nonlinear Analog Feature Extraction