Abstract:Currently, most methods for text steganalysis are based on deep neural networks (DNNs). However, in real-life scenarios, obtaining a sufficient amount of labeled stego-text for correctly training networks using a large number of parameters is often challenging and costly. Additionally, due to a phenomenon known as dataset bias or domain shift, recognition models trained on a large dataset exhibit poor generalization performance on novel datasets and tasks. Therefore, to address the issues of missing labeled data and inadequate model generalization in text steganalysis, this paper proposes a cross-domain stego-text analysis method (PDTS) based on pseudo-labeling and domain adaptation (unsupervised learning). Specifically, we propose a model architecture combining pre-trained BERT with a single-layer Bi-LSTM to learn and extract generic features across tasks and generate task-specific representations. Considering the differential contributions of different features to steganalysis, we further design a feature filtering mechanism to achieve selective feature propagation, thereby enhancing classification performance. We train the model using labeled source domain data and adapt it to target domain data distribution using pseudo-labels for unlabeled target domain data through self-training. In the label estimation step, instead of using a static sampling strategy, we propose a progressive sampling strategy to gradually increase the number of selected pseudo-label candidates. Experimental results demonstrate that our method performs well in zero-shot text steganalysis tasks, achieving high detection accuracy even in the absence of labeled data in the target domain, and outperforms current zero-shot text steganalysis methods.

Linguistic Steganalysis in Few-Shot Scenario

Small-Scale Linguistic Steganalysis for Multi-Concealed Scenarios

Linguistic Steganalysis Toward Social Network

Towards Next-Generation Steganalysis: LLMs Unleash the Power of Detecting Steganography

Linguistic Steganalysis Via Densely Connected LSTM with Feature Pyramid

Linguistic Steganalysis via LLMs: Two Modes for Efficient Detection of Strongly Concealed Stego

High-Performance Linguistic Steganalysis, Capacity Estimation and Steganographic Positioning.

State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research

Exploiting Language Model for Efficient Linguistic Steganalysis

Linguistic Steganography: from Symbolic Space to Semantic Space

Linguistic Steganalysis with Graph Neural Networks

LINK: Linguistic Steganalysis Framework with External Knowledge

A Fast and Efficient Text Steganalysis Method

TS-CNN: Text Steganalysis from Semantic Space Based on Convolutional Neural Network

Text Steganalysis with Attentional LSTM-CNN

SeSy: Linguistic Steganalysis Framework Integrating Semantic and Syntactic Features

Least significant bit steganography detection with machine learning techniques

Pseudo-label Based Domain Adaptation for Zero-Shot Text Steganalysis

TStego-THU: Large-Scale Text Steganalysis Dataset

CATS: Connection-Aware and Interaction-Based Text Steganalysis in Social Networks