Abstract:Covert communication (also known as steganography) is the practice of concealing a secret inside an innocuous-looking public object (cover) so that the modified public object (covert code) makes sense to everyone but only someone who knows the code can extract the secret (message). Linguistic steganography is the practice of encoding a secret message in natural language text such as spoken conversation or short public communications such as tweets.. While ad hoc methods for covert communications in specific domains exist ( JPEG images, Chinese poetry, etc), there is no general model for linguistic steganography specifically. We present a novel mathematical formalism for creating linguistic steganographic codes, with three parameters: Decodability (probability that the receiver of the coded message will decode the cover correctly), density (frequency of code words in a cover code), and detectability (probability that an attacker can tell the difference between an untampered cover compared to its steganized version). Verbal or linguistic steganography is most challenging because of its lack of artifacts to hide the secret message in. We detail a practical construction in Python of a steganographic code for Tweets using inserted words to encode hidden digits while using n-gram frequency distortion as the measure of detectability of the insertions. Using the publicly accessible Stanford Sentiment Analysis dataset we implemented the tweet steganization scheme -- a codeword (an existing word in the data set) inserted in random positions in random existing tweets to find the tweet that has the least possible n-gram distortion. We argue that this approximates KL distance in a localized manner at low cost and thus we get a linguistic steganography scheme that is both formal and practical and permits a tradeoff between codeword density and detectability of the covert message.

Linguistic Steganography: from Symbolic Space to Semantic Space

Semantic-Preserving Linguistic Steganography by Pivot Translation and Semantic-Aware Bins Coding

Linguistic Generative Steganography with Enhanced Cognitive-Imperceptibility.

A New Steganography Algorithm Based on Spatial Domain

ICStega: Image Captioning-based Semantically Controllable Linguistic Steganography

Linguistic Steganography by Sampling-based Language Generation.

Hi-Stega: A Hierarchical Linguistic Steganography Framework Combining Retrieval and Generation

Hiding in Plain Sight: Towards the Science of Linguistic Steganography

High-Performance Linguistic Steganalysis, Capacity Estimation and Steganographic Positioning.

Novel Linguistic Steganography Based on Character-Level Text Generation

Image Semantic Steganography: A Way to Hide Information in Semantic Communication

SeSy: Linguistic Steganalysis Framework Integrating Semantic and Syntactic Features

Linguistic Steganalysis Via Densely Connected LSTM with Feature Pyramid

Zero-shot Generative Linguistic Steganography

Small-Scale Linguistic Steganalysis for Multi-Concealed Scenarios

Provably Secure Generative Linguistic Steganography

Co-Stega: Collaborative Linguistic Steganography for the Low Capacity Challenge in Social Media

Multi-modal Steganography Based on Semantic Relevancy

Covert Communication By Exploring Statistical And Linguistical Distortion In Text

ALiSa: Acrostic Linguistic Steganography Based on BERT and Gibbs Sampling

Towards Next-Generation Steganalysis: LLMs Unleash the Power of Detecting Steganography