Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Inès Hyeonsu Kim,JoungBin Lee,Woojeong Jin,Soowon Son,Kyusun Cho,Junyoung Seo,Min-Seop Kwak,Seokju Cho,JeongYeol Baek,Byeongwon Lee,Seungryong Kim
2024-10-15
Abstract:Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. We propose Pose-dIVE, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment the training dataset to enable existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate training data with diverse human poses and camera viewpoints. Experimental results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **the problem of insufficient model generalization ability in the Person Re - Identification (Re - ID) task due to changes in human postures and camera perspectives**. Specifically, the Re - ID task faces the following challenges in practical applications: 1. **Changes in postures and perspectives**: Images of the same person taken by different cameras may have very different appearances due to differences in postures and perspectives, which makes the identification task difficult. 2. **Limitations of datasets**: Existing Re - ID datasets are usually lacking in diversity and extensibility, especially in terms of postures and perspectives, which limits the generalization ability of the model. It is also very difficult to manually label individuals under multiple cameras. To solve these problems, the paper proposes **Pose - dIVE**, a new data augmentation method. By introducing sparse and under - represented human posture and camera perspective samples into the training data, Pose - dIVE aims to enable existing Re - ID models to learn features that are not affected by changes in postures and perspectives, thereby improving the generalization ability and performance of the model. ### Main contributions 1. **Proposing the Pose - dIVE framework**: Utilize pre - trained large - scale diffusion models (such as Stable Diffusion) and combine with the SMPL model to generate training data with diverse postures and perspectives. 2. **Reducing posture bias**: By generating sparsely distributed posture and perspective samples, Pose - dIVE effectively reduces the posture bias in the training data and improves the generalization ability of the Re - ID model. 3. **Experimental verification**: Experimental results show that Pose - dIVE significantly improves the performance of existing models on multiple Re - ID benchmark datasets and outperforms other data - augmentation - based methods. ### Formula explanation The formulas involved in the paper are mainly concentrated in the model architecture and training process, such as the conditional input of the generation model and the loss function, etc. Here are some key formulas presented in Markdown format: - **Conditional input**: \[ \text{Condition} = \{\text{Depth Map}, \text{Surface Normals}, \text{Skeleton}\} \] - **Loss function**: \[ \mathcal{L} = \mathbb{E}_{x \sim p_{\text{data}}}[\|x - \hat{x}\|^2] \] where \( x \) is the real image, \( \hat{x} \) is the generated image, and the loss function adopts the mean square error (MSE). Through these methods, Pose - dIVE successfully solves the challenges brought by posture and perspective changes in the Re - ID task and improves the robustness and generalization ability of the model.