Linguistic Steganalysis in Few-Shot Scenario

Huili Wang,Zhongliang Yang,Jinshuai Yang,Cheng Chen,Yongfeng Huang
DOI: https://doi.org/10.1109/TIFS.2023.3298210
2023-01-01
Abstract:Due to the widespread use of text in cyberspace, linguistic steganography, which hides secret information into normal texts, develops quickly in these years. While linguistic steganography protects users' privacy, it also has the risk of being abused to endanger network security. Therefore, its corresponding detection technology, namely linguistic steganalysis, has attracted more and more researchers' attention in the past several years. However, most of the current linguistic steganalysis methods rely heavily on a large number of labeled samples, which presents a significant gap from real-world scenarios where labeled steganographic samples are difficult to obtain. In this paper, we proposed the Pre-trained Language model with Self-training for Few-shot Linguistic Steganalysis (LSFLS) method which effectively copes with few-shot linguistic steganalysis through a small number of labeled samples and some auxiliary unlabeled samples. Numerous experiments have proved that the proposed method can achieve high detection accuracy of linguistic steganalysis when only a few labeled samples are provided (even less than 10), significantly improving the detection ability of existing methods in few-shot scenario. Furthermore, the experimental results demonstrate that the proposed method can maintain good detection capability in the case of data source mismatch and label unbalance. We believe that our work will greatly advance the practical application of linguistic steganalysis techniques.
What problem does this paper attempt to address?