Amazigh audiovisual speech recognition system design

Ilham Addarrazi,Hassan Satori,Khalid Satori
DOI: https://doi.org/10.1109/isacv.2017.8054956
2017-04-01
Abstract:It is well known that speech recognition is a multimodal process which uses information not only from audio but also from vision. This paper describes our experience to design an audio visual speech recognition system, which relates the acoustic and the visual information in order to improve noise robustness of automatic speech recognition. The accuracy rate for face and mouth detection using Viola-Jones approach was satisfactory (reaches to 99% and 96.6% for face and mouth detection respectively).
What problem does this paper attempt to address?