ReID2.0:from Person ReID to Portrait Interpretation
Wang Shengjin,Dou Zhaopeng,Fan Yixuan,Li Yali
DOI: https://doi.org/10.11834/jig.220700
2023-01-01
Journal of Image and Graphics
Abstract:Person re-identification(Person ReID) has been concerned more in computer vision nowadays. It can identify a pedestrian-targeted in the images and recognize its multiple spatio-temporal re-appearance. Person ReID can be used to retrieve pedestrians-specific from image or video databases as well. Person re-identification research has strong practical needs and has potential applications in the fields of public safety, new retailing, and human-computer interaction. Conventional forensic-based human-relevant face recognition can provide one of the most powerful technical means for identity checking. However, it is challenged that imaging-coordinated is restricted by its rigid angle and distance. The semicoordinated face recognition is evolved in technically. Actually, there are a large number of scenarios-discreted to be dealt with for public surveillance, where the monitored objects do not need to cooperate with the camera to image, and they do not need to be aware that they are being filmed; in some extreme cases, Some suspects may even deliberately cover themselves key biometric features. To provide wide-ranged tracking spatiotemepally, the surveillance of public security is called for person re-identification urgently. It is possible to sort facial elements out from the back and interprete the facial features further in support of pedestrian re-identification technology. The potential of the person re-identification task is that the recognition object is a non-cooperative target. Pedestrian-oriented imaging has challenged for complicated changes in relevant to its posture, viewing angle, illumination, imaging quality, and certain occlusion-ranged. The key challenges are dealt with its learning-related issues of temporal-based image feature expression and spatial-based meta-image data to the distinctive feature. In addition, compared to the face recognition task, data collection and labeling are more challenging in the person re-identification task, and existing datasets gap are called to be bridged and richer intensively in comparison with face recognition datasets. The feature extractor-generated has a severe overfitting phenomenon in common. The heterogeneity of data set-cross model is still a big challenging issue. Interdisplinary research is calling for the breakthrough of person re-identification. Rank-1 and mean average precision(mAP) have been greatly improved on multiple datasets, and some of them have begun to be applied practically. Current person re-identification analysis is mainly focused on the elements of clothing appearance and lacks of explicit multivisual anglesi-view observation and description of pedestrian appearance, which is inconsistent with the mechanism of human observation. The human-relevant ability of comprehensive perception can generate an observation description of the target from the multi-visual surface information. For example, meet a familiar friend on the street: we will quick-responsed for the perception subconsciously even if we cannot see the face clearly. In addition to clothing information, we will perceive more information-contextual as well, including gender, age, body shape, posture, facial expression and mental state. This paper aims to break the existing setting of person re-identification task and form a comprehensive observation description of pedestrians. To facilitate person re-identification research further, we develop a portrait interpretation calculation(ReID2. 0) on the basis of prior person re-identification. Its attributes and motion-like status are observed and described on four aspects as mentioned below: 1) appearance, 2) posture, 3) emotion, and 4) intention. Here, appearance information is used to describe the apparent information of the face and biological characteristics; posture information is focused on the description of static and sequential body shape characteristics of the human body; emotion information is oriented to the facial expression of the human face and emotional expression of a pedestrian; intention information is targeted on the behavioral description and intentional predictions of a pedestrian; these four types of information is based on multi-view observation and perception of pedestrians, and a human-centered representation is constructed to a certain extent. Due to the difficulty of labeling, there is still no dataset to be constructed in a description requirements according to the four aspects of behavior awareness. We demonstrate a benchmark dataset of Portrait250K for the portrait interpretation calculation. The Portrait250K is composed of 250 000 portraits of 51 movies and TV series from various countries. For each portrait, there are eight human-annotated labels corresponding to eight subtasks. The distribution of images and labels illustrates ground truth features, such as its a) long-tailed or unbalanced distributions, b) diversified occlusions, c) truncations, d) lighting, e) clothing, f) makeup, and g) changeable background scenarios. To advance Portrait250K-based portrait interpretation calculation further, the metrics are designed for each subtask and an integrated evaluation metric, called portrait interpretation quality(PIQ),is developed systematically, which can balance the weights for each subtask. Furthermore, we design a paradigm of multi-task learning-based baseline method. Multi-task representation learning is concerned about and a spatial scheme is demonstrated, named feature space separation. A simple learning loss is proposed as well. The proposed portrait interpretation calculation forms a comprehensive observational description of pedestrians, which provides a reference for further research on person re-identification and human-like agents.