Semantic Consistency-Relevant Multitask Splicing-Tampered Detection
Zhang Yulin,Wang Hongxia,Zhang Rui,Zhang Jingyuan
DOI: https://doi.org/10.11834/jig.220549
2023-01-01
Journal of Image and Graphics
Abstract:ObjectiveForensics-oriented digital faked images and its editing and modification software have been emerging nowadays. To fake and misinterpret semantics of the original image, forgery-spliced is a commonly-used method in terms of new instances modification to the original image. Conventional methods are mainly concerned about the statistical information and physical features of the image itself in terms of convolutional neural network based(CNN-based) anomaly detection of forged images like edge features and noise features. But, it is still challenged for its semantic inconsistencies. In addition, image-tampered detection is challenged for human-behavioral image post-processing like compression or image filters.MethodTo detect images-forged splicing, semantic segmentation and noise reconstruction are used for CNN and multi-resolution-based detection. Our network-proposed consists of 4 aspects as mentioned below: 1) RGB stream, 2) noise stream,3) fusion module, and 4) multi-task module. The RGB stream is used to extract the boundary-tampered artifacts and its semantic information. To extract the noise features of the forged regions, a filter layer-based steganalysis is used because the RGB and noise information can offer multifaceted forgery detection. The semantic segmentation task is oriented to capture the semantic inconsistencies. The noise reconstruction task can yield the network to obtain a more diversified image noise distribution; and the forgery detection task is used to locate the tampered regions. Similar to recent multi-task networks-popular, a discrete loss function is used as well, and the sum of the loss functions for each task is regarded as the overall loss function of the network. To enhance the spatial co-occurrence of the two features further, the RGB and noise stream-derived fusion module can be used to fuse the features before the features are melted into the forgery detection task.Additionally, to obtain more complicated and accurate features, the multi-resolution pathway is implemented to the RGB streams, noise streams and feature fusion modules in the network. To enhance the network’s ability, multi-resolution pathway is tailored to perceive semantic and precise location information, and it is beneficial to location-oriented forgery detection tasks.ResultThe comparative experiments are carried out based on 6 tamper detection networks of those are 1)manipulation tracing network(ManTra-Net), 2) coarse to refined network(C2Rnet), 3) multi-task wavelet corrected network(MWC-Net), 4) compression artifact tracing network(CAT-Net), 5) ringed residual U-Net(RRU-Net), and6) high-resolution network(HRNet)-based baseline networks on Fantastic Reality and Spliced Dataset. Model training and testing are equipped with Intel Core i7-9700k CPU and NVIDIA Ge Force RTX2080Ti GPU. During training, stochastic gradient descent with a momentum of 0. 9 is used as the optimizer with an initial learning rate of 0. 005 and an exponential decay. TheF1scores on Fantastic Reality and Spliced Dataset are 0. 946 and 0. 961 of each. For temporal comparison experiment, our optimization is effective for balancing computational cost and network ability. The commonly-regular compression is in relevant to JPEG, whereas the image filters are used to adjust its contrast pairs and brightness. Therefore, to meet its natural scenario requirement, we design robustness experiments on the Fantastic Reality dataset based on 4 sorts of human-behavioral image post-processing methods of JPEG compression, contrast, brightness and noise distortion adjustment.ConclusionTo detect forged regions effectively and accurately, a semantic consistency-relevant multi-task and multi-resolution tampering detection network is demonstrated. The multitask strategy is implemented to extract certain semantic features and detect forgery regions in terms of semantic inconsistencies in forged images, while the multi-resolution network enables the network to obtain more diversified image information. Furthermore, robustness-based experiments demonstrate that our network-robust has its potentials for JPEG-compressed image post-processing.