Abstract:Achieving a better recognition rate for text in action video images is challenging due to multiple types of text with unpredictable actions in the background. In this paper, we propose a new method for the classification of caption (which is edited text) and scene text (text that is a part of the video) in video images. This work considers five action classes, namely, Yoga, Concert, Teleshopping, Craft, and Recipes, where it is expected that both types of text play a vital role in understanding the video content. The proposed method introduces a new fusion criterion based on Discrete Cosine Transform (DCT) and Fourier coefficients to obtain the reconstructed images for caption and scene text. The fusion criterion involves computing the variances for coefficients of corresponding pixels of DCT and Fourier images, and the same variances are considered as the respective weights. This step results in Reconstructed image-1. Inspired by the special property of Chebyshev-Harmonic-Fourier-Moments (CHFM) that has the ability to reconstruct a redundancy-free image, we explore CHFM for obtaining the Reconstructed image-2. The reconstructed images along with the input image are passed to a Deep Convolutional Neural Network (DCNN) for classification of caption/scene text. Experimental results on five action classes and a comparative study with the existing methods demonstrate that the proposed method is effective. In addition, the recognition results of the before and after the classification obtained from different methods show that the recognition performance improves significantly after classification, compared to before classification.

Caption Text Location with Combined Features for News Videos

A new video text detection method.

A New Method for Text Location in News Video Based on Ant Colony Algorithm

A New Method of News Local-caption Extraction Based on Spatio-temporal Distribution Feature

A Novel Video Caption Detection Approach Using Multi-Frame Integration

MFSR: Maximum Feature Score Region-Based Captions Locating in News Video Images.

Automatic Caption Location and Extraction in Digital Video Frame Based on SVM and ICA

Motion Guided Region Message Passing for Video Captioning

Segmentation of Caption Region Using Wavelet Transform and K-Mean Clustering

News Captions Detection Based on Corner Detection and Adaptive Threshold

Adaptive Spatial Location with Balanced Loss for Video Captioning

New Tampered Features for Scene and Caption Text Classification in Video Frame.

A Novel Algorithm for the Video Caption Extraction

A New Hybrid Method for Caption and Scene Text Classification in Action Video Images

Measuring apoptosis in neural stem cells.

Learning Video-Text Aligned Representations for Video Captioning

News video story segmentation based on topic caption text and audio information

Sparse Frame Grouping Network with Action Centered for Untrimmed Video Paragraph Captioning

Image Captioning in news report scenario

A Combined Algorithm for Video Text Extraction

A Multi-stage Method for Chinese Text Detection in News Videos