Automatic Segmentation of Sign Language into Subtitle-Units

Hannah Bull,Michèle Gouiffès,Annelies Braffort
DOI: https://doi.org/10.1007/978-3-030-66096-3_14
2020-01-01
Abstract:We present baseline results for a new task of automatic segmentation of Sign Language video into sentence-like units. We use a corpus of natural Sign Language video with accurately aligned subtitles to train a spatio-temporal graph convolutional network with a BiLSTM on 2D skeleton data to automatically detect the temporal boundaries of subtitles. In doing so, we segment Sign Language video into subtitle-units that can be translated into phrases in a written language. We achieve a ROC-AUC statistic of 0.87 at the frame level and 92% label accuracy within a time margin of 0.6s of the true labels.
What problem does this paper attempt to address?