Online Scene Text Tracking with Spatial-Temporal Relation

Yan Xiu,Hong-Yang Zhou,Shu Tian,Xu-Cheng Yin
DOI: https://doi.org/10.1007/978-3-030-87361-5_50
2021-01-01
Abstract:Scene texts in video are not fixed in color, size, format and are easily confused with the background, which imposes significant challenges in video scene text tracking. The trajectories are often be fragmented caused by these. Most tracking methods focus on the matching of the appearance features and the temporal information across frames, treating each text as a separate object. However, the relations among all texts are also important cues. In this paper, we propose a novel online video scene text tracking approach with the spatial-temporal relation module utilizing multiple cues, i.e. appearance, geometry and temporal. The spatial-temporal relation module enhances appearance features by modeling the relations between texts with each other in the same frame, which can avoid the influence of bad detection results, and track text stably and consistently. We achieved more tracked texts and more complete trajectories on IC15 with the spatial-temporal relation module.
What problem does this paper attempt to address?