Abstract:Overlay text in video carries important semantic clues for video information retrieval and summarization. In this paper, we propose a robust method that is able to accurately locate text lines and extract text even in complex video scene. In the text localization stage, this paper adopts the method based on corner point. First, corner detection is used to extract corners as text features from video frames. Then multi-layer filtering mechanism (MLFM) is used to locate the text lines, which consists of corners clustering, corners horizontal projection, background filtering and heuristic rules. This MLFM can effectively remove the isolated corners, locate the text lines accurately and remove the background or pseudo text lines automatically. In the text extraction stage, this paper proposed a twice binarization method that combines with polarity judgment on image. The polarity judgment was used as a guide to adjust the first binarization threshold when we perform the first binarization. After the first binarization, a main proportion of the image has been processed, and the rest will be processed by the second binarization. Experimental results show that this approach can fast and robustly locate text lines and extract text in video even under complex background.

A Robust Approach for Overlay Text Localization and Extraction in Complex Video Scene