Research on deep neural network's hidden layers in phoneme recognition

Yuan Ma,Jianwu Dang,Weifeng Li
DOI: https://doi.org/10.1109/ISCSLP.2014.6936718
2014-01-01
Abstract:In spite of the great success of the deep neural network (DNN) in speech processing, it is still unclear what kind of underlying mechanisms are involved in this achievement. This preliminary study attempts to find an answer by investigating the functions of DNN's hidden layers in representing speech articulations. Two sets of experiments are performed on the hidden layers in speech recognition. The layer removing experiment is conducted on the English TIMIT database, and the layer replacing experiment is to substitute a layer in an English DNN by the corresponding layer in a Japanese DNN. It is found that the different layers seem to be responsible for different phoneme groups according to the place of articulation. The lower layers are responsible for the back vowels, and the higher layers are responsible for the front vowels. The second layer (i.e. the first hidden layer) of the seven-layer network has major responsibility for more than half of the consonants with the constriction located in the front of the vocal tract, while the other consonants rely on the middle and higher layers. The layer replacing experiment demonstrated that the above relation was language independent. It is necessary to design elaborate studies to discover more details in the future.
What problem does this paper attempt to address?