Text Detection Through Multiple-Scale Localization in Video Sequences

黄剑,赵黎,杨士强
DOI: https://doi.org/10.3321/j.issn:1000-0054.2004.01.013
2004-01-01
Abstract:The localizing of text regions in frames is a key to successful video optical character recognition (Video-OCR). This paper describes a method for detecting, localizing, and refining on-screen text rectangles by analyzing gradient, textural, and temporal characteristics. The algorithm uses multiple scales of text localization based on a support vector machine (SVM). Preprocessing of the text bounding-box limits the SVM search range, while text refinement makes use of temporal information to remove occasional false alarms. Experimental results show that the algorithm surpasses the hit rate of similar algorithms by up to 21% and improves the miss rate by up to 57%. Integration of multiple characteristics largely suppresses computational complexity by adding more constraints and leads to better performance in tests with both Chinese and English characters.
What problem does this paper attempt to address?