Abstract:Adapting pre-trained deep learning models to customized tasks has become a popular choice for developers to cope with limited computational resources and data volume. More specifically, probing--training a downstream head on a pre-trained encoder--has been widely adopted in transfer learning, which helps to prevent overfitting and catastrophic forgetting. However, such generalizability of pre-trained encoders raises concerns about the potential misuse of probing for harmful intentions, such as discriminatory speculation and warfare applications. In this work, we introduce EncoderLock, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing, i.e., yielding poor performance on specified prohibited domains while maintaining their utility in authorized ones. Achieving this balance is challenging because of the opposite optimization objectives and the variety of downstream heads that adversaries can utilize adaptively. To address these challenges, EncoderLock employs two techniques: domain-aware weight selection and updating to restrict applications on prohibited domains/tasks, and self-challenging training scheme that iteratively strengthens resistance against any potential downstream classifiers that adversaries may apply. Moreover, recognizing the potential lack of data from prohibited domains in practical scenarios, we introduce three EncoderLock variants with different levels of data accessibility: supervised (prohibited domain data with labels), unsupervised (prohibited domain data without labels), and zero-shot (no data or labels available). We verify EncoderLock's effectiveness and practicality with a real-world pre-trained Vision Transformer (ViT) encoder from Facebook. These results underscore the valuable contributions EncoderLock brings to the development of responsible AI.

IPES: Improved Pre-trained Encoder Stealing Attack in Contrastive Learning

Watermarking Pre-trained Encoders in Contrastive Learning

PtbStolen: Pre-trained Encoder Stealing Through Perturbed Samples

StolenEncoder: Stealing Pre-trained Encoders in Self-supervised Learning

Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services

GhostEncoder: Stealthy Backdoor Attacks with Dynamic Triggers to Pre-trained Encoders in Self-supervised Learning

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples

Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing

Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

Manipulating Pre-Trained Encoder for Targeted Poisoning Attacks in Contrastive Learning

BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning

Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction

AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning

StegGuard: Fingerprinting Self-supervised Pre-trained Encoders via Secrets Embeder and Extractor

Downstream-agnostic Adversarial Examples

Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders

Mitigating Backdoor Attacks in Pre-Trained Encoders via Self-Supervised Knowledge Distillation

Hack Me If You Can: Aggregating AutoEncoders for Countering Persistent Access Threats Within Highly Imbalanced Data