DLI-Net: Dual Local Interaction Network for Fine-Grained Sketch-Based Image Retrieval

Haifeng Sun,Jiaqing Xu,Jingyu Wang,Qi,Ce Ge,Jianxin Liao
DOI: https://doi.org/10.1109/tcsvt.2022.3171972
IF: 5.859
2022-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Fine-grained sketch-based image retrieval (FG-SBIR) is considered an ideal method of image retrieval due to the rich and easily accessible characteristics of sketches. It aims to find the most similar photo from the photo gallery based on the input sketch. Most previous works follow the paradigm that extracting global feature first and then projecting the features of sketch and photo to unified embedding feature space using triplet loss. However, the global feature is not appropriate for extracting the crucial fine-grained information. Based on this principle, we propose a Dual Local Interaction Network (DLI-Net). DLI-Net explores an effective and efficient way to utilize local features for FG-SBIR. Specifically, we first propose a Local Feature Extractor to extract mid-level local features. Then, in response to the problems brought by local features, we propose a Dual Interaction Module, which contains Self Interaction Module and Cross Interaction Module. Self Interaction Module speeds up retrieval by eliminating the redundant local features of background. Cross Interaction Module solves the spatial misalignment by making the sketches interact with photos. Extensive experiments on six commonly used datasets show that our DLI-Net outperforms state-of-the-art competitors by a significant margin with a reasonable retrieval speed. Moreover, to the best of our knowledge, DLI-Net is the first model that beats humans on all six datasets. Besides, DLI-Net also performs best on cross-category fine-grained sketch-based image retrieval task, which further demonstrates local features are more appropriate for FG-SBIR.
What problem does this paper attempt to address?