Abstract:In recent years, there has been an increasing number of information hiding techniques based on network streaming media, focusing on how to covertly and efficiently embed secret information into real-time transmitted network media signals to achieve concealed communication. The misuse of these techniques can lead to significant security risks, such as the spread of malicious code, commands, and viruses. Current steganalysis methods for network voice streams face two major challenges: efficient detection under low embedding rates and short duration conditions. These challenges arise because, with low embedding rates (e.g., as low as 10%) and short transmission durations (e.g., only 0.1 second), detection models struggle to acquire sufficiently rich sample features, making effective steganalysis difficult. To address these challenges, this paper introduces a Dual-View VoIP Steganalysis Framework (DVSF). The framework first randomly obfuscates parts of the native steganographic descriptors in VoIP stream segments, making the steganographic features of hard-to-detect samples more pronounced and easier to learn. It then captures fine-grained local features related to steganography, building on the global features of VoIP. Specially constructed VoIP segment triplets further adjust the feature distances within the model. Ultimately, this method effectively address the detection difficulty in VoIP. Extensive experiments demonstrate that our method significantly improves the accuracy of streaming voice steganalysis in these challenging detection scenarios, surpassing existing state-of-the-art methods and offering superior near-real-time performance.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the efficiency and accuracy of VoIP (Voice over Internet Protocol) voice - stream steganalysis under the conditions of low embedding rate and short duration. Specifically, current steganalysis methods have difficulty in obtaining sufficient sample features when facing a low embedding rate (for example, as low as 10%) and a short transmission duration (for example, only 0.1 seconds), which makes effective steganalysis difficult. To address these challenges, the paper introduces a Dual - View VoIP Steganalysis Framework (DVSF). ### Specific description of the problem 1. **Low - embedding - rate detection**: In the case of a low embedding rate, the amount of hidden information is small, making it difficult for the detection model to distinguish the voice stream containing steganographic information from the normal voice stream. 2. **Short - duration detection**: Voice segments with a short duration contain fewer features, further increasing the difficulty of detection. 3. **Real - time requirement**: To ensure effectiveness and timeliness in practical applications, the steganalysis method needs to have near - real - time detection capabilities. ### Solution The DVSF framework proposed in the paper addresses the above challenges in the following ways: 1. **Randomly obfuscating some local steganographic descriptors**: By randomly obfuscating some local steganographic descriptors of VoIP stream segments, the steganographic features of difficult - to - detect samples are made more obvious, facilitating model learning. 2. **Capturing fine - grained local features**: Combining global features to capture fine - grained local features related to steganography, in order to enhance the model's ability to learn steganographic features. 3. **Specially constructed VoIP segment triplets**: By adjusting the feature distances in the model, the features of normal and steganographic VoIP segments are made more easily linearly separable in the model feature space. ### Experimental results A large number of experiments show that this method significantly improves the accuracy of streaming - voice steganalysis in these challenging detection scenarios, surpasses the existing state - of - the - art methods, and provides superior near - real - time performance. ### Summary The paper aims to solve the difficult problems of VoIP steganalysis under the conditions of low embedding rate and short duration, proposes an innovative dual - view framework, and improves the detection efficiency and accuracy through various technical means, providing strong support for ensuring network security and public safety.

Efficient Streaming Voice Steganalysis in Challenging Detection Scenarios

A New Steganalysis Approach Based on Both Complexity Estimate and Statistical Filter

Real-time Steganalysis for Streaming Media Based on Multi-Channel Convolutional Sliding Windows

Detection of audio-to-image audio steganography based on peak frequency feature

Real-Time Steganalysis for Stream Media Based on Multi-channel Convolutional Sliding Windows

Fast Steganalysis Method for VoIP Streams

Steganographic model and method with instant communication speech stream as carrier

Detection of Heterogeneous Parallel Steganography for Low Bit-Rate VoIP Speech Streams.

Fast Detection of Heterogeneous Parallel Steganography for Streaming Voice

A Detection Method of Subliminal Channel Based on VoIP Communication.

Steganalysis of Compressed Speech to Detect Covert Voice over Internet Protocol Channels

Practical Deep Learning Models for QIM-based VoIP Steganalysis

STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams

RNN-SM: Fast Steganalysis of VoIP Streams Using Recurrent Neural Network

Detecting Steganography in Inactive Voice-Over-IP Frames Based on Statistic Characteristics of Fundamental Frequency

An Approach of Covert Communication Based on the Adaptive Steganography Scheme on Voice over IP

Detecting Bitrate Modulation-Based Covert Voice-Over-IP Communication.

Steganography in Inactive Frames of VoIP Streams Encoded by Source Codec

A Covert Communication Model Based on Least Significant Bits Steganography in Voice over IP

Steganalysis of Analysis-By-Synthesis Speech Exploiting Pulse-Position Distribution Characteristics

An Adaptive Steganography Scheme for Voice over IP.