CVTStego-Net: A convolutional vision transformer architecture for spatial image steganalysis

Mario Alejandro Bravo-Ortiz,Esteban Mercado-Ruiz,Juan Pablo Villa-Pulgarin,Carlos Angel Hormaza-Cardona,Sebastian Quiñones-Arredondo,Harold Brayan Arteaga-Arteaga,Simon Orozco-Arias,Oscar Cardona-Morales,Reinel Tabares-Soto
DOI: https://doi.org/10.1016/j.jisa.2023.103695
IF: 4.96
2024-01-10
Journal of Information Security and Applications
Abstract:The principal investigations in image steganalysis in the spatial domain have concentrated on convolutional neural network (CNN) designs. However, existing CNNs increase the local receptive field of steganographic noise without considering global steganographic noise. This study introduces CVTStego-Net, a convolutional vision transformer for spatial domain image steganalysis that merges the strengths of convolutions and the advantages of attention mechanisms to capture both local and global dependencies. CVTStego-Net is composed of three stages: preprocessing stage , noise extraction, and analysis stage, and classification stage. The preprocessing stage involves a bifurcation with trainable and untrainable 30 SRM (Spatial Rich Models) filters to enhance steganographic noise. The noise extraction and analysis stage combines the SE-Block (Squeeze-and-Excitation) with residual operations to increase the sensitivity to steganographic noise and suppressing the influence of redundant information, and the classification stage combines SE-Block with a convolutional vision transformer to connect the local and global spatial relationships of the steganographic noise. This work enhanced the classification accuracies for steganographic algorithms compared to YEDROUDJ-Net, SR-Net, ZHU-Net, GBRAS-Net, and SNMC-Net. Specifically, the accuracy of CVTStego-Net for WOW at 0.2 bpp was 86.58%, and 0.4 bpp was 93.80%. Moreover, for S-UNIWARD at 0.2 and 0.4 bpp, the accuracies were 80.70% and 90.45%, respectively. For MiPOD at 0.2 and 0.4 bpp, the accuracies were 74.70% and 81.48%, respectively. For HILL at 0.2 and 0.4 bpp, the accuracies were 76.70% and 85.80%, respectively, and for HUGO at 0.2 and 0.4 bpp, the accuracies were 78.20% and 86.98%, respectively, using test data from the BOSSbase 1.01. The results demonstrate that convolutional vision transformers can classify steganographic images in the spatial domain.
computer science, information systems
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve an important problem in spatial - domain image steganalysis: when detecting steganographic noise, existing convolutional neural network (CNN) methods only focus on local features and ignore the relationship of global steganographic noise. Specifically: 1. **Limitations of existing methods**: - Current CNN designs mainly focus on spatial - domain image steganalysis. - These methods capture steganographic noise by increasing the local receptive field but ignore the dependency relationship of global steganographic noise. 2. **Proposed new method**: - The paper introduces a new architecture - CVTStego - Net (Convolutional Vision Transformer for Steganalysis), which combines the advantages of convolutional layers and attention mechanisms to capture local and global dependency relationships simultaneously. - CVTStego - Net consists of three stages: the pre - processing stage, the noise extraction and analysis stage, and the classification stage. 3. **Innovation points**: - **Pre - processing stage**: Use trainable and non - trainable SRM (Spatial Rich Models) filters to enhance steganographic noise. - **Noise extraction and analysis stage**: Combine SE - Block (Squeeze - and - Excitation) and residual operations to improve the sensitivity to steganographic noise and suppress the influence of redundant information. - **Classification stage**: Use SE - Block combined with convolutional vision transformers to connect the local and global spatial relationships of steganographic noise. 4. **Experimental results**: - Experiments show that CVTStego - Net has better classification accuracy than existing methods such as YEDROUDJ - Net, SR - Net, ZHU - Net, GBRAS - Net and SNMC - Net on multiple steganographic algorithms. - Specifically, for different steganographic algorithms (such as WOW, S - UNIWARD, MiPOD, HILL and HUGO), the accuracy of CVTStego - Net at different bit rates (bpp) has been significantly improved. ### Summary By introducing CVTStego - Net, this paper solves the problem that existing CNN methods only focus on local features in spatial - domain image steganalysis, and proposes a new architecture that can capture the local and global steganographic noise dependency relationships simultaneously, thereby improving the accuracy of steganalysis.