Abstract:Physical adversarial patches printed on clothing can easily allow individuals to evade person detectors. However, most existing adversarial patch generation methods prioritize attack effectiveness over stealthiness, resulting in patches that are aesthetically unpleasing. Although existing methods using generative adversarial networks or diffusion models can produce more natural-looking patches, they often struggle to balance stealthiness with attack effectiveness and lack flexibility for user customization. To address these challenges, we propose a novel diffusion-based customizable patch generation framework termed DiffPatch, specifically tailored for creating naturalistic and customizable adversarial patches. Our approach enables users to utilize a reference image as the source, rather than starting from random noise, and incorporates masks to craft naturalistic patches of various shapes, not limited to squares. To prevent the original semantics from being lost during the diffusion process, we employ Null-text inversion to map random noise samples to a single input image and generate patches through Incomplete Diffusion Optimization (IDO). Notably, while maintaining a natural appearance, our method achieves a comparable attack performance to state-of-the-art non-naturalistic patches when using similarly sized attacks. Using DiffPatch, we have created a physical adversarial T-shirt dataset, AdvPatch-1K, specifically targeting YOLOv5s. This dataset includes over a thousand images across diverse scenarios, validating the effectiveness of our attack in real-world environments. Moreover, it provides a valuable resource for future research.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key problems in generating adversarial patches: 1. **Balance between attack effectiveness and concealment**: - Existing methods for generating adversarial patches often focus more on attack effectiveness and overlook the concealment and natural appearance of the patches. This results in the generated patches looking unnatural visually and being easily recognizable by humans. - Although some methods based on generative adversarial networks (GANs) or diffusion models can generate more natural patches, they perform poorly in terms of attack effectiveness and user customization. 2. **Flexibility of user customization**: - Existing methods lack flexibility when generating adversarial patches and it is difficult to generate patches of specific shapes and styles according to the reference images provided by users. 3. **Preserving the original semantics**: - During the generation process, how to ensure that the patches do not lose the semantic information of the original image while maintaining a natural appearance is a challenge. To solve these problems, the author proposes a new diffusion - model - based adversarial patch generation framework - **DiffPatch**. This method achieves natural and customizable adversarial patch generation through several key techniques: - **Generation based on reference images**: Users can provide a reference image as input instead of generating patches from random noise. - **Mask control**: Use masks to generate patches of different shapes, not just limited to squares. - **Null - text inversion**: By optimizing the unconditional embedding vector, ensure that the generated patches retain the semantic information of the original image. - **Incomplete Diffusion Optimization (IDO)**: Combine IoU - Detection Loss to accelerate convergence and maintain the natural appearance of the patches by constraining the perturbation values. In addition, the author also created a physical - world adversarial T - shirt dataset **AdvPatch - 1K** containing more than 1,000 images to verify the effectiveness of this method in the actual environment. ### Formula summary - **DDIM inverse process formula**: \[ z_{t + 1}=r\frac{\alpha_{t + 1}}{\alpha_t}z_t + s\sqrt{\frac{1}{\alpha_{t + 1}}-1-\frac{r^2}{\alpha_t^2}}\cdot\epsilon_\theta(z_t,t,C) \] where $\epsilon_\theta$ is a pre - trained U - Net model used to predict noise, $C$ is a conditional embedding vector, and $\alpha_t$ is a scaling factor calculated according to the time steps $\beta_0,\dots,\beta_T\in(0,1)$. - **Null - text embedding optimization objective function**: \[ \min_{\phi_t}\|z_t^*-z_{t - 1}(\bar{z}_t,\phi_t,C)\|_2^2 \] - **IoU - Detection Loss**: \[ L_{IoU}=\frac{1}{N}\sum_{i = 1}^N\left(\frac{1}{M}\sum_{k = 1}^M\left[1\left(\max_j\text{IoU}(J_j,J'_k)>t\right)P(J'_k)\right]\right) \] where $M$ is the number of detected bounding boxes, $\text{IoU}(J_j,J'_k)$ is the intersection - over - union ratio of the predicted bounding box $J'_k$ and the ground - truth bounding box $J_j$, and $P(J'_k)$ is the product of the object probability and the classification probability. - **Perturbation constraint**: \[ \delta_t=\text{Proj}_\infty(\delta)

DiffPatch: Generating Customizable Adversarial Patches using Diffusion Model

Diffusion to Confusion: Naturalistic Adversarial Patch Generation Based on Diffusion Model for Object Detector

Natural Adversarial Patch Generation Method Based on Latent Diffusion Model

CAPatch: Physical Adversarial Patch against Image Captioning Systems

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks

DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination

Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

DOEPatch: Dynamically Optimized Ensemble Model for Adversarial Patches Generation

DePatch: Towards Robust Adversarial Patch for Evading Person Detectors in the Real World

DAP: A Dynamic Adversarial Patch for Evading Person Detectors

Generating Adversarial yet Inconspicuous Patches with a Single Image

MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World

Prompt-Guided Environmentally Consistent Adversarial Patch

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

Infrared Adversarial Patch Generation Based on Reinforcement Learning

Generating Visually Realistic Adversarial Patch

Generating Transferable and Stealthy Adversarial Patch via Attention-guided Adversarial Inpainting

Inconspicuous Adversarial Patches for Fooling Image Recognition Systems on Mobile Devices

Entropy-Boosted Adversarial Patch for Concealing Pedestrians in YOLO Models

Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model