Abstract:Scene text removal is a recent development in computer vision that replaces text patches in natural images with the appropriate background. Text removal is a difficult process leading to faulty areas of text containing text strokes with their hazy backgrounds. Text in the real world uses a variety of font kinds, some of which are difficult to localize due to their chaotic shapes, varied shading degrees, and orientation distortion.Scene text erasing may include the subtasks of text detection as well as text inpainting. Both subtasks require a large amount of data to be successful; but, existing approaches were limited by insufficient real-world data for scene-text elimination. Eventhough the existing works produced considerable performance improvement in scene text removal, they often leave many text remains like text strokes, thus producinglow-quality visual outcomes. Therefore, this paper proposes an automatic text inpainting and video quality elevation model by using the Improved Convolutional Network-based techniques.Primarily, the video samples are collected from the diverse datasets and then converted into frames. Next, the frames are deblurred using an enhanced Convolutional Neural Network (CNN) model that has three convolutional layers for accurately localizing the texts in frames. Subsequently, the texts are detected by utilizing the CLARA-based VGG-16 network. Afterward, the text strokes are removed using a convolutional Encoder and decoder network to eliminate the presence of text on complex backgrounds and textures. Here, the coordinates of text in the deblurred frames are used to crop out the text stroke regions. So, the texts are in-painted, and then, the text in-painted regions are pasted back to their original positions in the frames. Furthermore, the video quality is elevated with the help of the DenseNet-centric Enhancement network. The experimental outcomes demonstrate that the proposed model effectively removed scene texts and enhanced the video quality upto 52.09 dB PSNR and 93% SSIM, respectively than the existing methods.

A Deep Convolutional Deblurring And Detection Neural Network For Localizing Text In Videos

Video Text Detection by Attentive Spatiotemporal Fusion of Deep Convolutional Features

A Robust Approach for Scene Text Detection and Tracking in Video.

Video Text Detection with Fully Convolutional Network and Tracking

Learning an Occlusion-Aware Network for Video Deblurring

Stacked Convolutional Deep Encoding Network for Video-Text Retrieval.

Multi-Attention Convolutional Neural Network for Video Deblurring

A multi-task approach to face deblurring

Effective video deblurring based on feature-enhanced deep learning network for daytime and nighttime images

Text-Attentional Convolutional Neural Network for Scene Text Detection

Scene Text Detection with Fully Convolutional Neural Networks

Intelligent Detection Method of English Text in Natural Scenes in Video

A Unified Deep Neural Network For Scene Text Detection

Deblurring Videos Using Spatial-Temporal Contextual Transformer With Feature Propagation

Spatio-Temporal Filter Adaptive Network for Video Deblurring

Text-Attentional Convolutional Neural Networks for Scene Text Detection

Automatic text inpainting and quality elevation in video sequences

Image Deblurring Using Multi-Stream Bottom-Top-Bottom Attention Network and Global Information-Based Fusion and Reconstruction Network

Image Deblurring Based on a U-shaped Network for Vehicle Surveillance Scenarios

SharpFormer: Learning Local Feature Preserving Global Representations for Image Deblurring

Towards Robust Video Text Detection with Spatio-Temporal Attention Modeling and Text Cues Fusion