Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval

Yang Liu,Jiale Du,Xinbo Gao,Jungong Han
2024-11-28
Abstract:Sketch-based image retrieval (SBIR) relies on free-hand sketches to retrieve natural photos within the same class. However, its practical application is limited by its inability to retrieve classes absent from the training set. To address this limitation, the task has evolved into Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR), where model performance is evaluated on unseen categories. Traditional SBIR primarily focuses on narrowing the domain gap between photo and sketch modalities. However, in the zero-shot setting, the model not only needs to address this cross-modal discrepancy but also requires a strong generalization capability to transfer knowledge to unseen categories. To this end, we propose a novel framework for ZS-SBIR that employs a pair-based relation-aware quadruplet loss to bridge feature gaps. By incorporating two negative samples from different modalities, the approach prevents positive features from becoming disproportionately distant from one modality while remaining close to another, thus enhancing inter-class separability. We also propose a Relation-Aware Meta-Learning Network (RAMLN) to obtain the margin, a hyper-parameter of cross-modal quadruplet loss, to improve the generalization ability of the model. RAMLN leverages external memory to store feature information, which it utilizes to assign optimal margin values. Experimental results obtained on the extended Sketchy and TU-Berlin datasets show a sharp improvement over existing state-of-the-art methods in ZS-SBIR.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the cross - modal gap and the lack of generalization ability in **Zero - Shot Sketch - Based Image Retrieval (ZS - SBIR)**. Specifically, traditional Sketch - Based Image Retrieval (SBIR) models perform poorly when faced with unseen classes because these models can usually only handle classes that have appeared in the training set. The ZS - SBIR task requires the model to be able to handle unseen classes at the test stage, which increases the difficulty of the task. ### Main challenges: 1. **Cross - modal gap**: There are large visual feature differences between sketches and natural photos, making it difficult for the model to establish an effective mapping between them. 2. **Lack of generalization ability**: Traditional SBIR methods mainly focus on reducing the modal gap between sketches and photos, but in the zero - shot setting, the model also needs to have strong generalization ability to transfer knowledge to unseen classes. ### Solutions: To solve the above problems, the author proposes a novel framework, which mainly includes the following parts: 1. **Relation - Aware Quadruplet Loss**: - By introducing two negative samples (from different modalities), it prevents the positive sample features from being too far away in a certain modality while maintaining closeness to other modalities, thereby enhancing the inter - class separability. - It uses the normalized Euclidean distance as a metric and designs two types of quadruplets: global cross - modal quadruplets and local intra - modal quadruplets. 2. **Relation - Aware Meta - Learning Network (RAMLN)**: - It introduces an external memory matrix to store feature information and uses this information to dynamically adjust the margin in the quadruplet loss to improve the model's generalization ability. - The meta - learning strategy can adaptively determine the optimal margin, so as to better cope with the domain changes between different classes and modalities. 3. **Classification loss and projection layer**: - In addition to the quadruplet loss, it also uses cross - entropy loss combined with the Softmax function for classification training to avoid falling into local optimal solutions. ### Experimental results: The experimental results show that this method significantly outperforms the existing state - of - the - art methods on the extended Sketchy and TU - Berlin datasets, especially when dealing with unseen classes. Through these innovations, this paper effectively solves the key challenges in the ZS - SBIR task and improves the performance and generalization ability of the model in cross - modal retrieval.