BC-VAD: A Robust Bone Conduction Voice Activity Detection

Niccolo' Polvani,Damien Ronssin,Milos Cernak
DOI: https://doi.org/10.48550/arXiv.2212.02996
2022-12-06
Abstract:Voice Activity Detection (VAD) is a fundamental module in many audio applications. Recent state-of-the-art VAD systems are often based on neural networks, but they require a computational budget that usually exceeds the capabilities of a small battery-operated device when preserving the performance of larger models. In this work, we rely on the input from a bone conduction microphone (BCM) to design an efficient VAD (BC-VAD) robust against residual non-stationary noises originating from the environment or speakers not wearing the <a class="link-external link-http" href="http://BCM.We" rel="external noopener nofollow">this http URL</a> first show that a larger VAD system (58k parameters) achieves state-of-the-art results on a publicly available benchmark but fails when running on bone conduction signals. We then compare its variant BC-VAD (5k parameters and trained on BC data) with a baseline especially designed for a BCM and show that the proposed method achieves better performances under various metrics while keeping the realtime processing requirement for a microcontroller.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?