Context-Aware Audio-Visual Speech Enhancement Based on Neuro-Fuzzy Modeling and User Preference Learning
Song Chen,Jasper Kirton-Wingate,Faiyaz Doctor,Usama Arshad,Kia Dashtipour,Mandar Gogate,Zahid Halim,Ahmed Al-Dubai,Tughrul Arslan,Amir Hussain
DOI: https://doi.org/10.1109/tfuzz.2024.3435050
IF: 12.253
2024-10-08
IEEE Transactions on Fuzzy Systems
Abstract:It is estimated that by 2050 approximately one in ten individuals globally will experience disabling hearing impairment. In the presence of everyday reverberant noise, a substantial proportion of individual users encounter challenges in speech comprehension. This study introduces a novel application of neuro-fuzzy modeling that synergizes and fuses audio-visual speech enhancement (AV SE) with an initial user preference learning based framework. Specifically, our approach uniquely integrates multimodal AV speech data with innovative SE methods and fuzzy inferencing techniques. This integration is further enriched by incorporating a user-preference learning model that adapts to environmental and user-specific contexts, including signal-to-noise ratios, sound power, and the quality of visual information. The proposed framework facilitates the incorporation of clinical measures such as user cognitive load (or listening effort) with real-world uncertainty to steer the system outputs. We employ an adaptive fuzzy neural network to derive the most effective Sugeno fuzzy inference model, employing particle swarm optimization to ensure optimal SE by considering sound power, ambient noise levels, and visual quality. Experimental results utilize our new benchmark AV multitalker challenge dataset to demonstrate the superiority of our user preference-informed, context-aware AV SE approach in enhancing speech intelligibility and quality in challenging noisy conditions, marking a significant advancement over conventional methods while reducing energy consumption. The conclusion supports the ecological scalability of our approach and its potential for real-world applications, setting a new benchmark in AV SE research, paving the way for future assistive hearing and communication technologies.
computer science, artificial intelligence,engineering, electrical & electronic