Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health

Apiwat Ditthapron,Emmanuel O. Agu,Adam C. Lammert
DOI: https://doi.org/10.21437/Interspeech.2023-1026
2023-08-16
Abstract:Modern smartphones possess hardware for audio acquisition and to perform speech processing tasks such as speaker recognition and health assessment. However, energy consumption remains a concern, especially for resource-intensive DNNs. Prior work has improved the DNN energy efficiency by utilizing a compact model or reducing the dimensions of speech features. Both approaches reduced energy consumption during DNN inference but not during speech acquisition. This paper proposes using a masking kernel integrated into gradient descent during DNN training to learn the most energy-efficient speech length and sampling rate for windowing, a common step for sample construction. To determine the most energy-optimal parameters, a masking function with non-zero derivatives was combined with a low-pass filter. The proposed approach minimizes the energy consumption of both data collection and inference by 57%, and is competitive with speaker recognition and traumatic brain injury detection baselines.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?