Bounding-box Watermarking: Defense against Model Extraction Attacks on Object Detectors
Satoru Koda,Ikuya Morikawa
2024-11-20
Abstract:Deep neural networks (DNNs) deployed in a cloud often allow users to query models via the APIs. However, these APIs expose the models to model extraction attacks (MEAs). In this attack, the attacker attempts to duplicate the target model by abusing the responses from the API. Backdoor-based DNN watermarking is known as a promising defense against MEAs, wherein the defender injects a backdoor into extracted models via API responses. The backdoor is used as a watermark of the model; if a suspicious model has the watermark (i.e., backdoor), it is verified as an extracted model. This work focuses on object detection (OD) models. Existing backdoor attacks on OD models are not applicable for model watermarking as the defense against MEAs on a realistic threat model. Our proposed approach involves inserting a backdoor into extracted models via APIs by stealthily modifying the bounding-boxes (BBs) of objects detected in queries while keeping the OD capability. In our experiments on three OD datasets, the proposed approach succeeded in identifying the extracted models with 100% accuracy in a wide variety of experimental scenarios.
Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve
The paper aims to address a specific threat faced by object detection (OD) models in the cloud environment - model extraction attacks (MEAs). Specifically, when deep neural networks (DNNs) are deployed in the cloud and provide prediction services through APIs, these APIs may be misused by attackers to replicate the functionality of the target model. This replication not only violates the intellectual property rights of the model but may also cause economic losses to service providers.
To counter this threat, the paper proposes a backdoor - based watermarking method called "Bounding - Box Watermarking" (BBW). This method injects a backdoor into the extracted model by secretly modifying the bounding boxes (BBs) of the detected objects in the API response. This backdoor, as a watermark of the model, can be used to verify the ownership of the model. If a suspicious model contains this backdoor, that is, it outputs abnormal bounding boxes on specific trigger objects, it can be confirmed that the model was obtained through a model extraction attack.
### Main contributions
1. **First proposal**: This is the first backdoor - based watermarking method for object detection models to defend against model extraction attacks.
2. **Practicality**: This method is feasible in a real - world threat model. It does not require modification of the input image but only operates through the API response.
3. **Stealthiness**: The modified bounding boxes change slightly and are not easily detected by attackers.
4. **Function preservation**: The modified model still retains its original object detection ability and will not affect the use of legitimate users.
### Method overview
#### 1. Poisoning phase
- **Trigger object selection**: Select objects with specific characteristics as trigger objects.
- **Bounding box modification**: In the API response, only slightly modify the bounding boxes of the trigger objects (such as expanding or shrinking) to inject the backdoor.
#### 2. Verification phase
- **Verification dataset preparation**: Prepare a verification dataset containing trigger objects and non - trigger objects.
- **Suspiciousness score**: Calculate the suspiciousness score by comparing the prediction results of the target model and the suspicious model on the verification dataset. If the suspicious model shows abnormal bounding box changes on the trigger objects, it is confirmed as an extracted model.
### Experimental results
The paper conducted experiments on three object detection datasets (PascalVOC2007, Self - Driving Cars - TrafficSigns, CityPersons). The results show that BBW can identify the extracted model with 100% accuracy in various experimental scenarios. For example, in the VOC07 dataset, by only expanding the bounding boxes of 2% of the detected objects by 5%, the extracted model can be fully verified.
### Conclusion
The BBW method has successfully addressed the problem of model extraction attacks faced by object detection models in the cloud environment. This method is not only practical and stealthy but can also protect the intellectual property rights of the model without affecting the model's functionality.