Abstract:Achieving a better recognition rate for text in action video images is challenging due to multiple types of text with unpredictable actions in the background. In this paper, we propose a new method for the classification of caption (which is edited text) and scene text (text that is a part of the video) in video images. This work considers five action classes, namely, Yoga, Concert, Teleshopping, Craft, and Recipes, where it is expected that both types of text play a vital role in understanding the video content. The proposed method introduces a new fusion criterion based on Discrete Cosine Transform (DCT) and Fourier coefficients to obtain the reconstructed images for caption and scene text. The fusion criterion involves computing the variances for coefficients of corresponding pixels of DCT and Fourier images, and the same variances are considered as the respective weights. This step results in Reconstructed image-1. Inspired by the special property of Chebyshev-Harmonic-Fourier-Moments (CHFM) that has the ability to reconstruct a redundancy-free image, we explore CHFM for obtaining the Reconstructed image-2. The reconstructed images along with the input image are passed to a Deep Convolutional Neural Network (DCNN) for classification of caption/scene text. Experimental results on five action classes and a comparative study with the existing methods demonstrate that the proposed method is effective. In addition, the recognition results of the before and after the classification obtained from different methods show that the recognition performance improves significantly after classification, compared to before classification.

A Novel Video Caption Detection Approach Using Multi-Frame Integration

A new video text detection method.

A Novel Approach to Text Detection and Extraction from Videos by Discriminative Features and Density

A novel video text extraction approach based on multiple frames

A New Hybrid Method for Caption and Scene Text Classification in Action Video Images

A Novel Algorithm for the Video Caption Extraction

Automatic Caption Location and Extraction in Digital Video Frame Based on SVM and ICA

A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video

A New DCT-FFT Fusion Based Method for Caption and Scene Text Classification in Action Video Images.

Exploring Inter-Frame Correlation Analysis and Wavelet-Domain Modeling for Real-Time Caption Detection in Streaming Video

A Novel Video Object Tracking Approach Based on Kernel Density Estimation and Markov Random Field

Using Multiple Frame Integration for the Text Recognition of Video

A Novel Multi-oriented Chinese Text Extraction Approach from Videos

Caption-aided Speech Detection in Videos

New Tampered Features for Scene and Caption Text Classification in Video Frame.

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

Watch It Twice: Video Captioning with a Refocused Video Encoder

Multi-Oriented Text Detection and Verification in Video Frames and Scene Images

Video Scene Text Frames Categorization for Text Detection and Recognition

Temporal Integration for Word-Wise Caption and Scene Text Identification

Multi-Spectral Fusion Based Approach for Arbitrarily Oriented Scene Text Detection in Video Images