A pairwise algorithm for pitch estimation and speech separation using deep stacking network

Hui Zhang,Xueliang Zhang,Shuai Nie,Guanglai Gao,Wenju Liu
DOI: https://doi.org/10.1109/ICASSP.2015.7177969
2015-01-01
Abstract:Pitch information is an important cue for speech separation. However, pitch estimation in noisy condition is also a task as challenging as speech separation. In this paper, we propose a supervised learning architecture which combines these two problems concisely. The proposed algorithm is based on deep stacking network (DSN) which provides a method of stacking simple processing modules in building deep architecture. In the training stage, an ideal binary mask is used as target. The input vector includes the outputs of lower module and frame-level features which consist of spectral and pitch-based features. In the testing stage, each module provides an estimated binary mask which is employed to re-estimate pitch. Then we update the pitch-based features to the next module. This procedure is embedded iteratively in DSN, and we obtain the final separation results from the last module of DSN. Systematic evaluations show that the proposed approach produces high quality estimated binary mask and outperforms recent systems in generalization.
What problem does this paper attempt to address?