TBFormer: Two-Branch Transformer for Image Forgery Localization

Yaqi Liu,Binbin Lv,Xin Jin,Xiaoyu Chen,Xiaokun Zhang
DOI: https://doi.org/10.1109/LSP.2023.3279018
2023-02-25
Abstract:Image forgery localization aims to identify forged regions by capturing subtle traces from high-quality discriminative features. In this paper, we propose a Transformer-style network with two feature extraction branches for image forgery localization, and it is named as Two-Branch Transformer (TBFormer). Firstly, two feature extraction branches are elaborately designed, taking advantage of the discriminative stacked Transformer layers, for both RGB and noise domain features. Secondly, an Attention-aware Hierarchical-feature Fusion Module (AHFM) is proposed to effectively fuse hierarchical features from two different domains. Although the two feature extraction branches have the same architecture, their features have significant differences since they are extracted from different domains. We adopt position attention to embed them into a unified feature domain for hierarchical feature investigation. Finally, a Transformer decoder is constructed for feature reconstruction to generate the predicted mask. Extensive experiments on publicly available datasets demonstrate the effectiveness of the proposed model.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of image forgery localization. Specifically, the research aims to identify the forged areas in an image by capturing high - quality discriminative features. With the development of digital image - editing techniques, it has become increasingly difficult to distinguish the authenticity of forged images with the naked eye, which poses a potential threat to social stability and harmony. Therefore, it is particularly important to develop an effective method for image forgery localization. ### Main Problems and Background 1. **The Importance of Image Forgery Localization**: - Image forgery localization is an image forensics task, with the goal of locating the forged areas in the investigated image. - As image - editing techniques progress, forged images become more and more realistic and difficult to distinguish in terms of authenticity, which poses a threat to social stability and harmony. 2. **Limitations of Existing Methods**: - Existing image forgery localization methods mainly target specific types of forgeries (such as splicing, copy - move, removal, etc.), and many methods only extract features from the RGB domain. - Some methods attempt to combine features from different domains, but most are based on convolutional neural networks (CNN) and have limited effectiveness when dealing with complex forgery types. 3. **Advantages of Transformer**: - In recent years, Transformer has performed excellently in various visual tasks, such as object detection and image segmentation. - Transformer can overcome the limitations of convolutional neural networks, has strong global - dependency - modeling capabilities, and is suitable for image forgery localization. ### Solutions Proposed in the Paper To solve the above problems, this paper proposes a new Transformer - style network named TBFormer, with the following specific contributions: 1. **Two - Branch Feature Extractor**: - Two feature - extraction branches are designed to independently extract discriminative features from the RGB domain and the noise domain respectively. - Each branch contains multiple Transformer layers, and the weights are not shared, so as to focus on feature extraction in their respective domains. 2. **Attention - aware Hierarchical - feature Fusion Module (AHFM)**: - The AHFM module is proposed to effectively fuse the hierarchical features from two different domains. - The position - attention mechanism is used to embed the features of different domains into a unified feature domain, and the final fused feature map is generated through element - wise addition and convolution operations. 3. **Transformer Decoder**: - A Transformer decoder is constructed for feature reconstruction and generating prediction masks. - Class embeddings are set in the decoder to further learn the unified feature representation of the real and forged classes. 4. **Synthetic Dataset**: - To train and test TBFormer, a synthetic dataset containing 140,432 training images, 7,787 validation images, and 7,787 test images is generated. ### Experimental Results - Through extensive experimental verification, TBFormer shows superior performance on multiple public datasets. - The Ablation Study shows that the two - branch structure and the AHFM module contribute significantly to performance improvement. - The Robustness Analysis shows that TBFormer can still maintain high robustness under different types of image distortion. In conclusion, this paper proposes an innovative Transformer - style network TBFormer. By combining the features of the RGB domain and the noise domain and introducing the AHFM module for effective feature fusion, more accurate image forgery localization is achieved.