Abstract:Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. We propose Pose-dIVE, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment the training dataset to enable existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate training data with diverse human poses and camera viewpoints. Experimental results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **the problem of insufficient model generalization ability in the Person Re - Identification (Re - ID) task due to changes in human postures and camera perspectives**. Specifically, the Re - ID task faces the following challenges in practical applications: 1. **Changes in postures and perspectives**: Images of the same person taken by different cameras may have very different appearances due to differences in postures and perspectives, which makes the identification task difficult. 2. **Limitations of datasets**: Existing Re - ID datasets are usually lacking in diversity and extensibility, especially in terms of postures and perspectives, which limits the generalization ability of the model. It is also very difficult to manually label individuals under multiple cameras. To solve these problems, the paper proposes **Pose - dIVE**, a new data augmentation method. By introducing sparse and under - represented human posture and camera perspective samples into the training data, Pose - dIVE aims to enable existing Re - ID models to learn features that are not affected by changes in postures and perspectives, thereby improving the generalization ability and performance of the model. ### Main contributions 1. **Proposing the Pose - dIVE framework**: Utilize pre - trained large - scale diffusion models (such as Stable Diffusion) and combine with the SMPL model to generate training data with diverse postures and perspectives. 2. **Reducing posture bias**: By generating sparsely distributed posture and perspective samples, Pose - dIVE effectively reduces the posture bias in the training data and improves the generalization ability of the Re - ID model. 3. **Experimental verification**: Experimental results show that Pose - dIVE significantly improves the performance of existing models on multiple Re - ID benchmark datasets and outperforms other data - augmentation - based methods. ### Formula explanation The formulas involved in the paper are mainly concentrated in the model architecture and training process, such as the conditional input of the generation model and the loss function, etc. Here are some key formulas presented in Markdown format: - **Conditional input**: \[ \text{Condition} = \{\text{Depth Map}, \text{Surface Normals}, \text{Skeleton}\} \] - **Loss function**: \[ \mathcal{L} = \mathbb{E}_{x \sim p_{\text{data}}}[\|x - \hat{x}\|^2] \] where \( x \) is the real image, \( \hat{x} \) is the generated image, and the loss function adopts the mean square error (MSE). Through these methods, Pose - dIVE successfully solves the challenges brought by posture and perspective changes in the Re - ID task and improves the robustness and generalization ability of the model.

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Diffusion Augmentation and Pose Generation Based Pre-Training Method for Robust Visible-Infrared Person Re-Identification

ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models

Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification.

PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification

Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

Di^2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model

PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation

Diffusion-Based Pose Refinement and Multi-Hypothesis Generation for 3D Human Pose Estimation

Beyond Augmentation: Empowering Model Robustness under Extreme Capture Environments

DiffPose: Toward More Reliable 3D Pose Estimation

Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton

Pose-driven Deep Convolutional Model for Person Re-identification

PGDS: Pose-Guidance Deep Supervision for Mitigating Clothes-Changing in Person Re-Identification

DiffBody: Diffusion-based Pose and Shape Editing of Human Images

Pose Invariant Person Re-Identification using Robust Pose-transformation GAN

On exploring pose estimation as an auxiliary learning task for Visible–Infrared Person Re-identification

PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture

Generalizable Person Re-Identification via Viewpoint Alignment and Fusion