Abstract:We introduce a new framework, dubbed Cerberus, for attribute-based person re-identification (reID). Our approach leverages person attribute labels to learn local and global person representations that encode specific traits, such as gender and clothing style. To achieve this, we define semantic IDs (SIDs) by combining attribute labels, and use a semantic guidance loss to align the person representations with the prototypical features of corresponding SIDs, encouraging the representations to encode the relevant semantics. Simultaneously, we enforce the representations of the same person to be embedded closely, enabling recognizing subtle differences in appearance to discriminate persons sharing the same attribute labels. To increase the generalization ability on unseen data, we also propose a regularization method that takes advantage of the relationships between SID prototypes. Our framework performs individual comparisons of local and global person representations between query and gallery images for attribute-based reID. By exploiting the SID prototypes aligned with the corresponding representations, it can also perform person attribute recognition (PAR) and attribute-based person search (APS) without bells and whistles. Experimental results on standard benchmarks on attribute-based person reID, Market-1501 and DukeMTMC, demonstrate the superiority of our model compared to the state of the art.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenges in person re - identification (reID for short). Specifically, it aims to improve the reID task by introducing an attribute - based framework, especially to solve the following problems:
1. **Visual Similarity**: Different people may look very similar because they are wearing similar clothes or taking similar postures, which makes it difficult for traditional reID methods to distinguish.
2. **Zero - Shot Setting**: The reID task usually assumes that the face ID labels in the training and test data do not overlap, so a model that can generalize to unseen data is required.
3. **Insufficient Attribute Utilization**: Existing attribute - based reID methods directly use the features of the attribute network, which will instead reduce the reID performance, because these features tend to make people with the same attributes embedded too closely.
To solve these problems, the paper proposes a new framework - Cerberus. The following are the main contributions of Cerberus:
1. **Semantic IDs (SIDs)**: By combining human attribute labels into Semantic IDs (SIDs), Cerberus can learn the prototype features of each SID and use these prototype features to guide the representation embedding of people, ensuring that they encode the corresponding semantic information.
2. **Semantic Guidance Loss**: In order to make the representations of people with the same SID close to each other in the embedding space while maintaining the differences between different people, the paper introduces the Semantic Guidance Loss. This loss function encourages local and global representations to align with the corresponding SID prototypes, thus encoding relevant semantic information.
3. **Regularization Method**: In order to improve the generalization ability of the model on unseen data, Cerberus also proposes a regularization method, which uses the relationships between SID prototypes to estimate the prototypes of unseen SIDs, thereby improving the generalization performance of the model.
4. **Multi - task Capability**: Cerberus can not only perform well on the reID task, but also can perform Person Attribute Recognition (PAR) and Attribute - based Person Search (APS) without fine - tuning for each task.
Through these innovations, Cerberus can achieve state - of - the - art performance on standard attribute - based person re - identification benchmarks (such as Market - 1501 and DukeMTMC - reID), and also shows competitiveness in PAR and APS tasks.
### Formula Summary
1. **Semantic Guidance Loss**:
\[
\mathcal{L}_{\text{sem}}=\frac{1}{|\mathcal{P}|} \sum_{(G, g) \in \mathcal{P}} \max(1 - m_g^G - s(f_x^G, p_g^G), 0)
\]
where $\mathcal{P}=\{(H, h),(U, u),(L, l),(I, i),(C, c)\}$ is the set of pairings of attribute groups and corresponding SID labels, $s(\cdot, \cdot)$ calculates the cosine similarity between inputs, and $m_g^G$ is the margin, defined as:
\[
m_g^G = \log\left(\alpha\cdot\frac{N_g^G}{N}+\beta\right)
\]
2. **Identification Loss**:
\[
\mathcal{L}_{\text{id}}=\frac{1}{|\mathcal{G}|} \sum_{G \in \mathcal{G}}\left(-\log p(y_x|f_x^G)+\log\left(1+\exp(d(f_x^G, f_p^G)-d(f_x^G, f_n^G))\right)\right)
\]
where $\mathcal{G}=\{H, U, L, I, C\}$, $p(y_x|f_x^G)$ is the probability that the representation $f_x^G$ belongs to the ID label $y_