Three-Dimensional Joint Geometric-Physiologic Feature For Lip-Reading

Jianguo Wei,Fan Yang,Ju Zhang,Ruiguo Yu,Mei Yu,Jianrong Wang
DOI: https://doi.org/10.1109/ICTAI.2018.00155
2018-01-01
Abstract:Lip-reading has been successfully demonstrated that it can improve the performance of automatic speech recognition system especially in the presence of acoustic noise. However, the information about lip movement is still insufficient as the lip features are obtained from discrete three-dimensional points and planar images. The internal mechanisms of lip movement are not described and reflected. In this paper, we employed a novel deepening technique, namely densely connected convolutional networks (DenseNets), to obtain visual representation from color images. In addition, a new 3D lip physiologic feature based on the position and structure of facial muscles was extracted to represent the similarity of the way people speak. The color image feature and 3D lip geometric-physiologic feature were coupled together in the last fully-connected layer of DenseNets. The experimental results show that DenseNets can handle spatial temporal information of a whole image sequence and the lip feature integrating our proposed 3D geometric-physiological feature is sufficient to improve the recognition rate by as much as 3.91% (from 94.84%, with the color images only, to 98.75%).
What problem does this paper attempt to address?