Pedestrian Attribute Recognition Method Based on the Progressive Iterative Optimization
Ding Zhengyan,Shang Yanfeng,Zhang Chongyang
DOI: https://doi.org/10.11834/jig.221064
2023-01-01
Journal of Image and Graphics
Abstract:Objective The pedestrian attribute recognition task is currently challenged for the sample distribution issue of some severe unbalanced attribute categories. To resolve the problems, we develop a method of progressive iteration optimization for pedestrian attribute recognition.Method First, data generation model based on masked autoencoder is used for data extension of the unbalanced categories distribution, and general large model-derived can be oriented to the small task.The balanced attributes-data generation model(BA-DGM) relevant masked autoencoder can be utilized to mask the original pedestrian images in terms of a random masking ratio and such newly generated images can be obtained for smallamount categories. The potential information can be fully mined, such as the topological relationship of the visible area, and the latent features-derived pedestrian images can be more resilient. Furthermore, it demonstrates that the autoencoder model can effectively achieve the universal feature representation of the targeted pedestrian, including the consensus features like the relationship-interconnected between various key components of the pedestrian. Second, discrimination model is used for filtering-consistent for the newly generated sample data, and the heuristic attention mechanism is adopted and implemented to deal with generative adversarial networks(GANs). The newly attention features-data discrimination model(AF-DDM) can be utilized and the diversified sample can be achieved while the key features of the attributes are preserved, which can enhance the interpretability of the recognition model. At the same time, to learn effective featuresrelated attributes, the filtered data is generated for training model. In the training process of the discrimination model, 50-layer residual network model is adopted as the backbone network to be trained on the original attribute recognition dataset, using a multi-label classification framework. And, in the reasoning process of the discrimination model, the whole attribute labels are divided into two categories: key attribute labels and other related attribute labels. For key attribute labels, to keep consistent with the original labels and preserve the relevant high confidence, the newly generated sample can be kept in consistency in terms of the predicted labels from discrimination model, but it cannot be vice versed. Finally, the pedestrian attribute recognition model and data-contextual can be optimized further based on the cyclical iteration of data generation and discrimination. To optimize generalization ability of the model, the knowledge distillation framework can be used to fuse the discrimination models of the balanced sample data as well. After multiple iterations, the progressive iterationsdistillation fusion model(PI-DFM) based attribute discrimination models can be used as the teacher models and category balancing-afterward attribute recognition dataset is used as the training data. The above models are mutual-benefited in accordance with the datasets of different sample proportions. The network structure of the student model is consistent with the teacher model and the Kullback-Leibler(KL) divergence between the student output and the teacher output is calculated as the distillation loss function. In large-scale practical application scenarios, the sample proportion of test data and train data might be different. To improve the generalization ability of the model in an open uncertain scenario, teacher model can be trained by integrating different sample-proportion data in terms of the knowledge distillation framework.Result Experimental results are demonstrated that the proposed optimization method can effectively improve the accuracy of the model on the four popular evaluation datasets. The proposed metrics for attributes and samples are calculated, including 1) the mean accuracy of all attributes and 2) the F1 score of all samples, representing the harmonic average of the mean accuracy and the mean recall. For example, in the richly annotated pedestrian v2(RAPv2) dataset, the mean accuracy is increased by about 5. 0% and the average F1 score is increased by about 1. 7% as well on the hypothesis of an unchanged model complexity. After several loops of cyclic iteration, the number of unbalanced categories in the original data is reduced to zero, and the optimization can be thus realized for the dataset. In the ablation studies, new samples are randomly generated for each positive sample image, and then the discrimination model is used to filter inconsistent samples. The probability of spatial distribution of the preserved details is analyzed experimentally in terms of the masked region analysis of the filtered samples. The heuristic attention mechanism is introduced and data discrimination model can retain the relevant features of the key attributes of the targeted pedestrian better, which demonstrates that the interpretability of the discrimination model can be further improved by deeply mining the distribution of related features for different attributes.Conclusion The progressive iterative optimization strategy proposed in this paper has good complementarity with the existing improvement methods, and is helpful to further improve the accuracy of the recognition model. To optimize the relationship modeling among multiple pedestrian attributes and improve the interpretability of the recognition model further, future research direction can be predicted and focused on universal feature representation-based masked autoencoder(MAE) model combined with such prior knowledge like human skeleton structure.