Abstract:With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency.

Balancing Robustness and Covertness in NLP Model Watermarking: A Multi-Task Learning Approach.

SecNLP: an NLP Classification Model Watermarking Framework Based on Multi-Task Learning

Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

DeepHider: A Covert NLP Watermarking Framework Based on Multi-task Learning

Protecting Your NLG Models with Semantic and Robust Watermarks

Leveraging Multi-task Learning for Umambiguous and Flexible Deep Neural Network Watermarking.

Watermarking PLMs on Classification Tasks by Combining Contrastive Learning with Weight Perturbation

PLMmark: A Secure and Robust Black-Box Watermarking Framework for Pre-trained Language Models

Secure Watermark for Deep Neural Networks with Multi-task Learning

Robust Multi-bit Natural Language Watermarking through Invariant Features

Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data

Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers

NLP Neural Network Copyright Protection Based on Black Box Watermark

Watermarking Pre-trained Language Models with Backdooring

Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding As a Service

A novel watermarking framework for intellectual property protection of NLG APIs.

Deep Neural Network Watermarking Against Model Extraction Attack

A Principled Approach to Natural Language Watermarking

Provably Robust Watermarks for Open-Source Language Models

Mark My Words: Analyzing and Evaluating Language Model Watermarks

Watermarking Language Models through Language Models