Convolutional Maxout Neural Networks for Speech Separation

Like Hui,Meng Cai,Cong Guo,Liang He,Wei-Qiang Zhang,Jia Liu
DOI: https://doi.org/10.1109/isspit.2015.7394335
2015-01-01
Abstract:Speech separation based on deep neural networks (DNNs) has been widely studied recently, and has achieved considerable success. However, previous studies are mostly based on fully-connected neural networks. In order to capture the local information of speech signals, we propose to use convolutional maxout neural networks (CMNNs) to separate speech and noise by estimating the ideal ratio mask of the time-frequency units. In our work the proposed CMNN is applied in the frequency domain. By using local filtering and max-pooling, convolutional neural networks can model the local structure of speech signals. Instead of sigmoid function, maxout is selected to address the saturation problem. In addition, dropout is integrated into the network to get better generalization ability. The proposed system outperforms a traditional DNN-based system in both objective speech quality and intelligibility.
What problem does this paper attempt to address?