Abstract:To enhance our ability to model long-range semantical dependencies, we introduce a novel approach for linguistic steganography through English translation. This method leverages attention mechanisms and probability distribution theory, known as NMT-stega (Neural Machine Translation-steganography). Specifically, to optimize translation accuracy and make full use of valuable source text information, we employ an attention-based NMT model as our translation technique. To address potential issues related to the degradation of text quality due to secret information embedding, we have devised a dynamic word pick policy based on probability variance. This policy adaptively constructs an alternative set and dynamically adjusts embedding capacity at each time step, guided by variance thresholds. Additionally, we have incorporated prior knowledge into the model by introducing a hyper-parameter that balances the contributions of the source and target text when predicting the embedded words. Extensive ablation experiments and comparative analyses, conducted on a large-scale Chinese-English corpus, validate the effectiveness of the proposed method across several critical aspects, including embedding rate, text quality, anti-steganography, and semantical distance. Notably, our numerical results demonstrate that the NMT-stega method outperforms alternative approaches in anti-steganography tasks, achieving the highest scores in two steganalysis models, NFZ-WDA (with score of 53) and LS-CNN (with score of 56.4). This underscores the superiority of NMT-stega in the anti-steganography attack task. Furthermore, even when generating longer sentences, with average lengths reaching 47 words, our method maintains strong semantical relationships, as evidenced by a semantic distance of 87.916. Moreover, we evaluate the proposed method using two metrics, Bilingual Evaluation Understudy and Perplexity, and achieve impressive scores of 42.103 and 23.592, respectively, highlighting its exceptional performance in the machine translation task.

Effective linguistic steganography detection

Detection of audio-to-image audio steganography based on peak frequency feature

A Dct-Based Image Steganographic Method Resisting Statistical Attacks

Linguistic Steganography Detection Using Statistical Characteristics of Correlations Between Words

A Statistical Algorithm for Linguistic Steganography Detection Based on Distribution of Words

Linguistic Steganography Detection Algorithm Using Statistical Language Model

A co-occurrence matrix based approach to detect jpeg steganography

Linguistic Steganography Detection Based on Perplexity

An efficient linguistic steganography for Chinese text

Detection of Substitution-Based Linguistic Steganography by Relative Frequency Analysis

Linguistic Steganography: from Symbolic Space to Semantic Space

Neural Linguistic Steganography with Controllable Security.

Least significant bit steganography detection with machine learning techniques

Blind Linguistic Steganalysis Against Translation Based Steganography

A Statistical Attack on a Kind of Word-Shift Text-Steganography

Covert Communication By Exploring Statistical And Linguistical Distortion In Text

High-Performance Linguistic Steganalysis, Capacity Estimation and Steganographic Positioning.

A Novel Approach to Detect the Presence of LSB Steganographic Messages

A novel method for linguistic steganography by English translation using attention mechanism and probability distribution theory

Analysis and detection of text steganographic tool-Stegparty

Algorithm for Detecting Steganographic Information Based on Characteristic of Embedded Message