TENet: Leveraging Transformer Encoders for Steganalysis of QIM Steganography in VoIP Speech Streams

Cheng Zhang,Shujuan Jiang,Zhong Chen
DOI: https://doi.org/10.1007/s11042-023-17802-8
IF: 2.577
2023-01-01
Multimedia Tools and Applications
Abstract:Quantization index modulation (QIM) based steganography allows concealing confidential information in the Voice-over-Internet-Protocol (VoIP) speech streams. Cyber attackers and lawbreakers could take advantage of this technique to commence malicious activities. In this paper, we bring up the idea of polynomial codewords (PCs) and bag-of-codewords problems in VoIP steganalysis. To encounter the issues raised, we introduced a simple but more robust and dynamic way of applying codeword and position embeddings to VoIP streams. Then, depending on the codeword and position embeddings, we carefully designed a QIM-based VoIP steganalysis model named TENet. TENet could extract the latent codeword encoder representation that contains both the potential meanings of codewords and the correlations among them through codeword embedding, codeword position embedding, and the transformer encoder. The obtained codeword representation can then be used for sample classification. The experimental results showed that TENet outperforms other state-of-the-art QIM-based VoIP steganalysis methods. Meanwhile, TENet has excellent performance in the testing time and resource consumption experiments.
What problem does this paper attempt to address?