Abstract:Though pre-trained encoders can be easily accessed online to build downstream machine learning (ML) services quickly, various attacks have been designed to compromise the security and privacy of these encoders. While most attacks target encoders on the upstream side, it remains unknown how an encoder could be threatened when deployed in a downstream ML service. This paper unveils a new vulnerability: the Pre-trained Encoder Inference (PEI) attack, which posts privacy threats toward encoders hidden behind downstream ML services. By only providing API accesses to a targeted downstream service and a set of candidate encoders, the PEI attack can infer which encoder is secretly used by the targeted service based on candidate ones. We evaluate the attack performance of PEI against real-world encoders on three downstream tasks: image classification, text classification, and text-to-image generation. Experiments show that the PEI attack succeeds in revealing the hidden encoder in most cases and seldom makes mistakes even when the hidden encoder is not in the candidate set. We also conducted a case study on one of the most recent vision-language models, LLaVA, to illustrate that the PEI attack is useful in assisting other ML attacks such as adversarial attacks. The code is available at <a class="link-external link-https" href="https://github.com/fshp971/encoder-inference" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper primarily explores a novel privacy threat—Pre-trained Encoder Inference (PEI) attack, which targets pre-trained encoders that have been deployed in downstream machine learning services. Specifically, the goal of the PEI attack is to infer which pre-trained encoders are covertly used in the target service by only accessing the API interface of the target downstream service and a set of candidate encoders' API interfaces. The paper first introduces the background knowledge and related work, then elaborates on the working principle, design framework, and how to implement the attack for different data modalities (such as images and text). ### Main Contributions 1. **Revealing New Vulnerabilities**: Proposes a new attack method called PEI, which can infer the pre-trained encoders hidden in downstream machine learning services. 2. **General Framework**: Proposes a general black-box attack framework that can implement PEI attacks in a task-agnostic manner. 3. **Experimental Validation**: Conducts experiments on three downstream tasks (image classification, text classification, and text-to-image generation) to validate the effectiveness of the PEI attack. 4. **Case Study**: Demonstrates how PEI attacks can assist in adversarial attacks on the multimodal model LLaVA through a case study. 5. **Defense Discussion**: Provides a preliminary discussion on some potential defense measures to resist PEI attacks. ### Attack Framework The PEI attack includes two stages: - **PEI Attack Sample Synthesis Stage**: Generates attack samples by minimizing the embedding difference between the attack samples and the target samples under a specific encoder. - **Hidden Encoder Inference Stage**: Uses the generated attack samples to evaluate the behavior similarity of the target service, thereby inferring whether the hidden encoder belongs to the candidate set and which specific encoder it is. ### Features - The attacker only needs API-level access to carry out the attack. - PEI attacks can successfully reveal hidden encoders in most cases with a low false positive rate. - The cost of implementing the attack is relatively low, approximately a few hundred dollars per candidate encoder. In summary, this paper delves into the privacy threats that pre-trained encoders may face when deployed in downstream services and proposes an effective attack method to reveal these hidden encoders. Additionally, it demonstrates its effectiveness through experiments and discusses potential defense strategies.

Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

StolenEncoder: Stealing Pre-trained Encoders in Self-supervised Learning

Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing

IPES: Improved Pre-trained Encoder Stealing Attack in Contrastive Learning

BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning

GhostEncoder: Stealthy Backdoor Attacks with Dynamic Triggers to Pre-trained Encoders in Self-supervised Learning

Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders

Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples

Downstream-agnostic Adversarial Examples

PE-Attack: on the Universal Positional Embedding Vulnerability in Transformer-based Models

PtbStolen: Pre-trained Encoder Stealing Through Perturbed Samples

Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders

An Empirical Study of Backdoor Attacks on Masked Auto Encoders

AttrLeaks on the Edge: Exploiting Information Leakage from Privacy-Preserving Co-inference

Privacy Preserving Deep Learning with Distributed Encoders.

StegGuard: Fingerprinting Self-supervised Pre-trained Encoders via Secrets Embeder and Extractor

PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels

Mitigating Backdoor Attacks in Pre-Trained Encoders via Self-Supervised Knowledge Distillation

Hack Me If You Can: Aggregating AutoEncoders for Countering Persistent Access Threats Within Highly Imbalanced Data