Cepstral Smoothing of Spectral Masks for Acoustic Vector-Sensor Based Convolutive Speech Separation

Xiaoyi Chen,Yingmin Wang
DOI: https://doi.org/10.1109/icspcc.2014.6986318
2014-01-01
Abstract:A novel algorithm, which based on the combination of binary time-frequency (T-F) mask and a cepstral smoothing approach, is proposed to separate the convolutive speech sources. The direction-of-arrival (DOA) is estimated from a single acoustic vector-sensor (AVS), and is used as a cue to estimate the T-F masks. The smoothing approach is employed in the cepstral domain to reduce the musical noise caused by the binary T-F masking process. The performance of the proposed method is evaluated by generating the mixtures of three speech sources with the simulated room models. The signal to distortion ratio (SDR) results in dB is calculated to quantify the separation performance. In the comparison with the baseline method which based on the binary mask without smoothing, the separation performance of the proposed method are consistently better under various reverberation levels and angular difference of sources.
What problem does this paper attempt to address?