The Impact of Reduced Video Quality on Visual Speech Recognition

Laura Dungan,Ali Karaali,Naomi Harte
DOI: https://doi.org/10.1109/icip.2018.8451754
2018-10-01
Abstract:Speech recognition technology has become widespread in recent years to the point where almost anyone with a laptop or mobile device has access to it. Despite this, it still poses the problem of poor recognition in noisy environments. Audio-Visual Speech Recognition (AVSR) provides a possible solution to this problem as the visual channel is not affected by the acoustic noise. However there are other factors that could impact the performance, namely poor quality of the video footage. This aspect of the visual side of speech recognition in noise is less explored, partially due to a lack of large, publicly available, high quality audio-visual continuous-speech databases. Fortunately, these problems can now be considered more fully with the availability of datasets such as TCD-TIMIT. In this paper, we explore the impact of the following visual degradations on visual speech recognition: white Gaussian noise, JPEG compression, reduced resolution and motion blur. Experimental results show that in some cases, the recogniser can be remarkably resilient, i.e. in the case of the motion blur, while other degradations can affect the recogniser performance drastically.
What problem does this paper attempt to address?