Geometrical and Pixel Based Lip Feature Fusion in Speech Synthesis System Driven by Visual-speech

Wang Mengjun
DOI: https://doi.org/10.1109/cinc.2010.5643872
2010-01-01
Abstract:Lipreading is applied to synthesize speech for the speech-impaired people. To get a higher recognition result, data fusion with weighting coefficients at feature level is used to integrate the lip information from different kinds of lip features. Experiments are carried out based on HMM with different states and Gaussian mixture component in a small database for speaker-dependent case. From the recognition results, the most important conclusion that can be drawn is that, the integrated discriminate vector after feature fusion outperforms than geometrical features vector only, DCT descriptors vector only and DCT coefficients vector only with 4 states and 16 Gaussian mixture component HMM. And compare with the geometrical features vector and DCT descriptors cascaded method, the geometrical features vector and DCT coefficients cascaded method integrates more information of lip region, and the recognition rate is improved by as much as 3.18% with best weighting coefficients (m: n=1.5:1).
What problem does this paper attempt to address?