A Novel Resynchronization Procedure For Hand-Lips Fusion Applied To Continuous French Cued Speech Recognition

Li Liu,Gang Feng,Denis Beautemps,Xiao-Ping Zhang
DOI: https://doi.org/10.23919/EUSIPCO.2019.8903053
2019-01-01
Abstract:Cued Speech (CS) is an augmented lip reading with the help of hand coding. Due to lips and hand movements are asynchronous and a direct fusion of these asynchronous features may reduce the efficiency of the recognition, the fusion of them in automatic CS recognition is a challenging problem. In our previous work, we built a hand preceding model for hand positions (vowels) by investigating the temporal organization of hand movements in French CS. In this work, we investigate a suitable value of the hand preceding time for consonants by analyzing the temporal movements of hand shapes in French CS. Then, based on these two results, we propose an efficient resynchronization procedure for the fusion of multi-stream features in CS. This procedure is applied to the continuous CS phoneme recognition based on the multi-stream CNN-HMMs architecture. The result shows that using this procedure brings an improvement of about 4.6% in the phoneme recognition correctness, compared with the state-of-the-art, which does not take into account the asynchrony of multi-modalities.
What problem does this paper attempt to address?