Detection of QIM-Based Steganography in VoIP Streams: A MobileViT-Inspired Model

Cheng Zhang,Shujuan Jiang
DOI: https://doi.org/10.1109/lsp.2024.3419691
2024-01-01
IEEE Signal Processing Letters
Abstract:In the past decades, there have been many studies on VoIP steganalysis models for quantization index modulation (QIM) based steganography. However, most of the proposed models in these studies did not consider resource consumption, limiting the application scenarios of these models. Inspired by MobileViT, we proposed a lightweight VoIP steganalysis model in this letter named LStegT (Lightweight steganalysis transformer), which could combine the strengths of convolutional neural networks (CNN) and transformers. First, LStegT utilizes 1D deep-wise separable convolutions to capture the correlations among codewords (local correlations). Then, LStegT applies a transformer encoder to encode the correlations among frames (global correlations). The application of deep-wise separable convolutions could significantly reduce the computation resource consumption. Besides, the transformer encoder in LStegT requires fewer training parameters because it only needs to encode the correlation among frames. In this letter, we exhibit how we designed the architecture of LStegT based on the characteristics of VoIP streams and QIM-based steganography. Also, we explained why LStegT is lightweight from the MobileViT perspective. Finally, our experimental results show that LStegT performs superbly in detecting QIM-based steganography in VoIP streams.
What problem does this paper attempt to address?