Abstract:Image forgery localization aims to identify forged regions by capturing subtle traces from high-quality discriminative features. In this paper, we propose a Transformer-style network with two feature extraction branches for image forgery localization, and it is named as Two-Branch Transformer (TBFormer). Firstly, two feature extraction branches are elaborately designed, taking advantage of the discriminative stacked Transformer layers, for both RGB and noise domain features. Secondly, an Attention-aware Hierarchical-feature Fusion Module (AHFM) is proposed to effectively fuse hierarchical features from two different domains. Although the two feature extraction branches have the same architecture, their features have significant differences since they are extracted from different domains. We adopt position attention to embed them into a unified feature domain for hierarchical feature investigation. Finally, a Transformer decoder is constructed for feature reconstruction to generate the predicted mask. Extensive experiments on publicly available datasets demonstrate the effectiveness of the proposed model.

What problem does this paper attempt to address?

This paper attempts to solve the problem of image forgery localization. Specifically, the research aims to identify the forged areas in an image by capturing high - quality discriminative features. With the development of digital image - editing techniques, it has become increasingly difficult to distinguish the authenticity of forged images with the naked eye, which poses a potential threat to social stability and harmony. Therefore, it is particularly important to develop an effective method for image forgery localization. ### Main Problems and Background 1. **The Importance of Image Forgery Localization**: - Image forgery localization is an image forensics task, with the goal of locating the forged areas in the investigated image. - As image - editing techniques progress, forged images become more and more realistic and difficult to distinguish in terms of authenticity, which poses a threat to social stability and harmony. 2. **Limitations of Existing Methods**: - Existing image forgery localization methods mainly target specific types of forgeries (such as splicing, copy - move, removal, etc.), and many methods only extract features from the RGB domain. - Some methods attempt to combine features from different domains, but most are based on convolutional neural networks (CNN) and have limited effectiveness when dealing with complex forgery types. 3. **Advantages of Transformer**: - In recent years, Transformer has performed excellently in various visual tasks, such as object detection and image segmentation. - Transformer can overcome the limitations of convolutional neural networks, has strong global - dependency - modeling capabilities, and is suitable for image forgery localization. ### Solutions Proposed in the Paper To solve the above problems, this paper proposes a new Transformer - style network named TBFormer, with the following specific contributions: 1. **Two - Branch Feature Extractor**: - Two feature - extraction branches are designed to independently extract discriminative features from the RGB domain and the noise domain respectively. - Each branch contains multiple Transformer layers, and the weights are not shared, so as to focus on feature extraction in their respective domains. 2. **Attention - aware Hierarchical - feature Fusion Module (AHFM)**: - The AHFM module is proposed to effectively fuse the hierarchical features from two different domains. - The position - attention mechanism is used to embed the features of different domains into a unified feature domain, and the final fused feature map is generated through element - wise addition and convolution operations. 3. **Transformer Decoder**: - A Transformer decoder is constructed for feature reconstruction and generating prediction masks. - Class embeddings are set in the decoder to further learn the unified feature representation of the real and forged classes. 4. **Synthetic Dataset**: - To train and test TBFormer, a synthetic dataset containing 140,432 training images, 7,787 validation images, and 7,787 test images is generated. ### Experimental Results - Through extensive experimental verification, TBFormer shows superior performance on multiple public datasets. - The Ablation Study shows that the two - branch structure and the AHFM module contribute significantly to performance improvement. - The Robustness Analysis shows that TBFormer can still maintain high robustness under different types of image distortion. In conclusion, this paper proposes an innovative Transformer - style network TBFormer. By combining the features of the RGB domain and the noise domain and introducing the AHFM module for effective feature fusion, more accurate image forgery localization is achieved.

TBFormer: Two-Branch Transformer for Image Forgery Localization

LBRT: Local-Information-Refined Transformer for Image Copy–Move Forgery Detection

Progressive Feedback-Enhanced Transformer for Image Forgery Localization

F2Trans: High-Frequency Fine-Grained Transformer for Face Forgery Detection

Cross-attention based two-branch networks for document image forgery localization in the Metaverse

End-to-end Image Splicing Localization Based on Multi-Scale Features and Residual Refinement Module

Two-stream Encoder-Decoder Network for Localizing Image Forgeries

TBFormer: three-branch efficient transformer for semantic segmentation

Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion

CNN-Transformer Based Generative Adversarial Network for Copy-Move Source/ Target Distinguishment

Face Forgery Detection with Long-Range Noise Features and Multilevel Frequency-Aware Clues

Branch-Transformer: A Parallel Branch Architecture to Capture Local and Global Features for Language Identification

CMFDFormer: Transformer-based Copy-Move Forgery Detection with Continual Learning

ET: Edge-Enhanced Transformer for Image Splicing Detection

Double-branch forgery image detection based on multi-scale feature fusion

DMFF-Net: Double-stream multilevel feature fusion network for image forgery localization

Can Deep Network Balance Copy-Move Forgery Detection and Distinguishment?

MTFDN: An image copy‐move forgery detection method based on multi‐task learning

LMAFormer: Local Motion Aware Transformer for Small Moving Infrared Target Detection

MGQFormer: Mask-Guided Query-Based Transformer for Image Manipulation Localization

UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization