Low Complexity Own Voice Reconstruction for Hearables with an In-ear Microphone

Mattes Ohlenbusch,Christian Rollwage,Simon Doclo
2024-09-06
Abstract:Hearable devices, equipped with one or more microphones, are commonly used for speech communication. Here, we consider the scenario where a hearable is used to capture the user's own voice in a noisy environment. In this scenario, own voice reconstruction (OVR) is essential for enhancing the quality and intelligibility of the recorded noisy own voice signals. In previous work, we developed a deep learning-based OVR system, aiming to reduce the amount of device-specific recordings for training by using data augmentation with phoneme-dependent models of own voice transfer characteristics. Given the limited computational resources available on hearables, in this paper we propose low-complexity variants of an OVR system based on the FT-JNF architecture and investigate the required amount of device-specific recordings for effective data augmentation and fine-tuning. Simulation results show that the proposed OVR system considerably improves speech quality, even under constraints of low complexity and a limited amount of device-specific recordings.
Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the recording quality and intelligibility through a low - complexity Own Voice Reconstruction (OVR) system when using hearables to capture the user's own voice in a noisy environment. Specifically: 1. **Problem Background**: - In a noisy environment, hearables (such as smart earplugs) are usually used for voice communication. - The external microphone will capture environmental noise and the user's own - voice signal. - Although the in - ear microphone benefits from the ear canal blockage to reduce environmental noise, the recorded own - voice signal has problems such as low - frequency amplification, bandwidth limitation, and body - generated noise. 2. **Existing Challenges**: - Although previous deep - learning methods are effective, they have high computational complexity and a large number of parameters, and are not suitable for resource - limited hearables. - Training the OR system requires a large number of device - specific own - voice signals, which are difficult to obtain in practical applications. 3. **Research Objectives**: - Propose a low - complexity OVR system based on the FT - JNF architecture to reduce the computational resource requirements. - Explore the number of device - specific own - voice signals required in the data augmentation and fine - tuning processes to ensure that the voice quality can be effectively improved even with a small number of device - specific recordings. 4. **Solutions**: - Design a low - complexity OVR system using the Frequency and Time Joint Nonlinear Filter (FT - JNF) architecture. - Generate more training data through a phoneme - dependent data augmentation method and use a small number of device - specific recordings for fine - tuning. - Evaluate the performance of different complexity variants and analyze the impact of the number of required device - specific recordings on performance. In summary, this paper aims to develop a low - complexity own - voice reconstruction system suitable for hearables, which can significantly improve voice quality and intelligibility with limited computational resources and a small number of device - specific recordings.