Self-correcting Text-Aware Strategy for Text-Caption

J. Lian,Xiaochun Cao,Pengwen Dai
DOI: https://doi.org/10.1117/12.3029640
2024-01-01
Abstract:Nowadays there is a problem of "semantic gap" in computer vision. The existing text-based caption can not fully utilize the textual information in the image and the text recognition step may generate false recognition results, but there is no modification mechanism to correct the errors. In this work, we improve the existing approach by proposing a text-aware recognizer to extract image text information from the input data and generate corresponding text descriptions and text features. Considering the relationship between the text object and the image content, in order to improve the semantic errors in the text description sentences generated by the character recognizer, we introduce the caption-rectify module, which can better improve the text information involved in the image and model the text information recognized in the textcaps dataset. Seriously speaking, we propose to use the current state-of-the-art text recognizer to detect characters and generate contextual descriptions of images. Moreover, we propose a correction mechanism and demonstrate qualitatively and quantitatively that the correction can make the final caption statement consistent with the textual information in the image, improving the semantic accuracy of the text description. We validated our approach on text caption task, thoroughly analyzed each module, and showed significant improvements compared with the current advanced model LSTM-R and CNMT.
What problem does this paper attempt to address?