Robust Video Text Detection Through Parametric Shape Regression, Propagation and Fusion.

Long Chen,Jiahao Shi,Feng Su
DOI: https://doi.org/10.1109/ICME51207.2021.9428195
2021-01-01
Abstract:Scene text in one video carries valuable semantic information for various video applications. The varied and complex text appearance and video context, however, make reliable detection of scene text in the video a challenging task. In this paper, we propose a novel end-to-end video text detection frame-work with an effective proposal-level text information propagation and fusion mechanism for robust detection of video text. Specifically, on the basis of a parametric shape representation and regression model for intra-frame text detection and an integrated cross-frame text region propagation mechanism, we correlate corresponding text candidates in adjacent frames and accordingly propagate a text candidate’s shape parameters and features in the previous frame to the current frame as supplementary text cues, which are then attentively fused with those of the current frame for improved text detection results. Experiment results on standard benchmarks demonstrate the effectiveness of our method for robustly detecting video text with widely varied appearances.
What problem does this paper attempt to address?