Automatic text inpainting and quality elevation in video sequences

Lakshmi Harika Palivela,Vivekanandan Dharmalingam,D. Bala Gayathri
DOI: https://doi.org/10.1007/s11042-024-20189-9
IF: 2.577
2024-12-06
Multimedia Tools and Applications
Abstract:Scene text removal is a recent development in computer vision that replaces text patches in natural images with the appropriate background. Text removal is a difficult process leading to faulty areas of text containing text strokes with their hazy backgrounds. Text in the real world uses a variety of font kinds, some of which are difficult to localize due to their chaotic shapes, varied shading degrees, and orientation distortion.Scene text erasing may include the subtasks of text detection as well as text inpainting. Both subtasks require a large amount of data to be successful; but, existing approaches were limited by insufficient real-world data for scene-text elimination. Eventhough the existing works produced considerable performance improvement in scene text removal, they often leave many text remains like text strokes, thus producinglow-quality visual outcomes. Therefore, this paper proposes an automatic text inpainting and video quality elevation model by using the Improved Convolutional Network-based techniques.Primarily, the video samples are collected from the diverse datasets and then converted into frames. Next, the frames are deblurred using an enhanced Convolutional Neural Network (CNN) model that has three convolutional layers for accurately localizing the texts in frames. Subsequently, the texts are detected by utilizing the CLARA-based VGG-16 network. Afterward, the text strokes are removed using a convolutional Encoder and decoder network to eliminate the presence of text on complex backgrounds and textures. Here, the coordinates of text in the deblurred frames are used to crop out the text stroke regions. So, the texts are in-painted, and then, the text in-painted regions are pasted back to their original positions in the frames. Furthermore, the video quality is elevated with the help of the DenseNet-centric Enhancement network. The experimental outcomes demonstrate that the proposed model effectively removed scene texts and enhanced the video quality upto 52.09 dB PSNR and 93% SSIM, respectively than the existing methods.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?