Abstract:The principal investigations in image steganalysis in the spatial domain have concentrated on convolutional neural network (CNN) designs. However, existing CNNs increase the local receptive field of steganographic noise without considering global steganographic noise. This study introduces CVTStego-Net, a convolutional vision transformer for spatial domain image steganalysis that merges the strengths of convolutions and the advantages of attention mechanisms to capture both local and global dependencies. CVTStego-Net is composed of three stages: preprocessing stage , noise extraction, and analysis stage, and classification stage. The preprocessing stage involves a bifurcation with trainable and untrainable 30 SRM (Spatial Rich Models) filters to enhance steganographic noise. The noise extraction and analysis stage combines the SE-Block (Squeeze-and-Excitation) with residual operations to increase the sensitivity to steganographic noise and suppressing the influence of redundant information, and the classification stage combines SE-Block with a convolutional vision transformer to connect the local and global spatial relationships of the steganographic noise. This work enhanced the classification accuracies for steganographic algorithms compared to YEDROUDJ-Net, SR-Net, ZHU-Net, GBRAS-Net, and SNMC-Net. Specifically, the accuracy of CVTStego-Net for WOW at 0.2 bpp was 86.58%, and 0.4 bpp was 93.80%. Moreover, for S-UNIWARD at 0.2 and 0.4 bpp, the accuracies were 80.70% and 90.45%, respectively. For MiPOD at 0.2 and 0.4 bpp, the accuracies were 74.70% and 81.48%, respectively. For HILL at 0.2 and 0.4 bpp, the accuracies were 76.70% and 85.80%, respectively, and for HUGO at 0.2 and 0.4 bpp, the accuracies were 78.20% and 86.98%, respectively, using test data from the BOSSbase 1.01. The results demonstrate that convolutional vision transformers can classify steganographic images in the spatial domain.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve an important problem in spatial - domain image steganalysis: when detecting steganographic noise, existing convolutional neural network (CNN) methods only focus on local features and ignore the relationship of global steganographic noise. Specifically: 1. **Limitations of existing methods**: - Current CNN designs mainly focus on spatial - domain image steganalysis. - These methods capture steganographic noise by increasing the local receptive field but ignore the dependency relationship of global steganographic noise. 2. **Proposed new method**: - The paper introduces a new architecture - CVTStego - Net (Convolutional Vision Transformer for Steganalysis), which combines the advantages of convolutional layers and attention mechanisms to capture local and global dependency relationships simultaneously. - CVTStego - Net consists of three stages: the pre - processing stage, the noise extraction and analysis stage, and the classification stage. 3. **Innovation points**: - **Pre - processing stage**: Use trainable and non - trainable SRM (Spatial Rich Models) filters to enhance steganographic noise. - **Noise extraction and analysis stage**: Combine SE - Block (Squeeze - and - Excitation) and residual operations to improve the sensitivity to steganographic noise and suppress the influence of redundant information. - **Classification stage**: Use SE - Block combined with convolutional vision transformers to connect the local and global spatial relationships of steganographic noise. 4. **Experimental results**: - Experiments show that CVTStego - Net has better classification accuracy than existing methods such as YEDROUDJ - Net, SR - Net, ZHU - Net, GBRAS - Net and SNMC - Net on multiple steganographic algorithms. - Specifically, for different steganographic algorithms (such as WOW, S - UNIWARD, MiPOD, HILL and HUGO), the accuracy of CVTStego - Net at different bit rates (bpp) has been significantly improved. ### Summary By introducing CVTStego - Net, this paper solves the problem that existing CNN methods only focus on local features in spatial - domain image steganalysis, and proposes a new architecture that can capture the local and global steganographic noise dependency relationships simultaneously, thereby improving the accuracy of steganalysis.

CVTStego-Net: A convolutional vision transformer architecture for spatial image steganalysis

Image steganalysis with convolutional vision transformer

A Blind Steganalytic Scheme Based on DCT and Spatial Domain for JPEG Images.

Blind Jpeg Steganalysis Using Features Derived from Multi-Domain

Highly Accurate End-to-end Image Steganalysis Based on Auxiliary Information and Attention Mechanism

Spatial Steganalysis Based on Non-Local Block and Multi-Channel Convolutional Networks.

Structural Design of Convolutional Neural Networks for Steganalysis

CNN-Assisted Steganography -- Integrating Machine Learning with Established Steganographic Techniques

CIS-Net: A Novel CNN Model for Spatial Image Steganalysis via Cover Image Suppression

Maximizing steganalysis performance using siamese networks for image

A Novel Technique for Image Steganalysis Based on Separable Convolution and Adversarial Mechanism

Forensic Video Steganalysis in Spatial Domain by Noise Residual Convolutional Neural Network

Color Image Steganalysis Based on Pixel Difference Convolution and Enhanced Transformer With Selective Pooling

Iscmis:Spatial-Channel Attention Based Deep Invertible Network for Multi-Image Steganography

Global texture sensitive convolutional transformer for medical image steganalysis

StegaVision: Enhancing Steganography with Attention Mechanism

CIRNet: An Improved Lightweight Convolution Neural Network Architecture with Inverted Residuals for Universal Steganalysis

A Siamese CNN for Image Steganalysis

Deeply‐Recursive Attention Network for Video Steganography

Multi-contextual design of convolutional neural network for steganalysis

Image Steganalysis Network Based on Dual-Attention Mechanism