Synergistic Optimization Based Binaural Time-Frequency Masking for Speech Source Localization.

Hong Liu,Lulu Wu,Bing Yang
DOI: https://doi.org/10.1109/robio49542.2019.8961527
2019-01-01
Abstract:Monaural time-frequency (TF) masking has been demonstrated to advance the performance of binaural speech source localization. However, it fails to consider interaural information, which may result in severe distortion of interaural cues. To mitigate these impacts, this paper presents a novel method for binaural speech source localization based on binaural TF masking. Firstly, the CNN-based binaural TF masking network is designed to suppress the noise and reverberation in TF fragments, which is trained in the independent stage. Then, the resulted binaural TF masking is synergistically refined with the localization network to compensate for the distorted interaural cues. The final source direction is estimated using the trained network. The proposed method is compared with other baseline methods and two-stage models composed by cascade TF masking network and localization network. Experimental results show our method outperforms the other compared methods in the adverse environments with different reverberation time and signal-to-noise ratios.
What problem does this paper attempt to address?