An Adaptive Method Based on Multiscale Dilated Convolutional Network for Binaural Speech Source Localization

Lulu Wu,Hong Liu,Bing Yang,Runwei Ding
DOI: https://doi.org/10.1155/2020/5819624
IF: 2.3
2020-01-01
Complexity
Abstract:Most binaural speech source localization models perform poorly in unprecedentedly noisy and reverberant situations. Here, this issue is approached by modelling a multiscale dilated convolutional neural network (CNN). The time-related crosscorrelation function (CCF) and energy-related interaural level differences (ILD) are preprocessed in separate branches of dilated convolutional network. The multiscale dilated CNN can encode discriminative representations for CCF and ILD, respectively. After encoding, the individual interaural representations are fused to map source direction. Furthermore, in order to improve the parameter adaptation, a novel semiadaptive entropy is proposed to train the network under directional constraints. Experimental results show the proposed method can adaptively locate speech sources in simulated noisy and reverberant environments.
What problem does this paper attempt to address?