In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

Philipp Schilk,Niccolò Polvani,Andrea Ronco,Milos Cernak,Michele Magno
DOI: https://doi.org/10.1145/3576842.3582365
2023-09-06
Abstract:The recent ubiquitous adoption of remote conferencing has been accompanied by omnipresent frustration with distorted or otherwise unclear voice communication. Audio enhancement can compensate for low-quality input signals from, for example, small true wireless earbuds, by applying noise suppression techniques. Such processing relies on voice activity detection (VAD) with low latency and the added capability of discriminating the wearer's voice from others - a task of significant computational complexity. The tight energy budget of devices as small as modern earphones, however, requires any system attempting to tackle this problem to do so with minimal power and processing overhead, while not relying on speaker-specific voice samples and training due to usability concerns. This paper presents the design and implementation of a custom research platform for low-power wireless earbuds based on novel, commercial, MEMS bone-conduction microphones. Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications. Furthermore, the paper accurately evaluates a proposed low-power personalized speech detection algorithm based on bone conduction data and a recurrent neural network running on the implemented research platform. This algorithm is compared to an approach based on traditional microphone input. The performance of the bone conduction system, achieving detection of speech within 12.8ms at an accuracy of 95\% is evaluated. Different SoC choices are contrasted, with the final implementation based on the cutting-edge Ambiq Apollo 4 Blue SoC achieving 2.64mW average power consumption at 14uJ per inference, reaching 43h of battery life on a miniature 32mAh li-ion cell and without duty cycling.
Audio and Speech Processing,Machine Learning
What problem does this paper attempt to address?
This paper discusses the use of low-power bone conduction microphones in small true wireless earbuds to enhance audio and improve voice communication clarity. Traditional microphones often capture poor voice quality and are susceptible to environmental noise interference due to the distance from the wearer's mouth. Bone conduction microphones can isolate voice recording better and are suitable for personalized voice activity detection (pVAD) and further audio enhancement applications. The main contributions of this paper are as follows: 1. Design and implementation of a research platform for low-power wireless earbuds based on a novel commercial MEMS bone conduction microphone. 2. Development and evaluation of an ultra-low-power bone-conduction-based personalized voice activity detection algorithm, as well as exploring further energy-saving possibilities. 3. Comparison of the performance of industry-standard Nordic NRF5340 and Ambiq Apollo 4 Blue chips in ultra-low-power edge processing. Ambiq Apollo 4 Blue chip was selected, achieving an average power consumption of 2.64 milliwatts and able to run for 43 hours on a tiny 32mAh lithium-ion battery without periodic shutdown. The research also points out that although there have been works on low-power VAD (voice activity detection), these methods often fail to differentiate between the target speaker and others, making them unsuitable for speech enhancement in earbuds. Bone conduction microphones provide a new approach to distinguish the wearer's voice and, when combined with TinyML technology, can reduce processing requirements without the need for specific speaker training samples. In addition, the paper created a self-built dataset that includes bone conduction, air conduction, and external noise for model training and testing. A lightweight pVAD model was designed using a recurrent neural network with approximately 5000 parameters, suitable for running in resource-constrained environments. Experimental results demonstrate that the bone conduction system outperforms traditional microphone inputs in speech detection, with higher accuracy and robustness.