Content-based image retrieval through fusion of deep features extracted from segmented neutrosophic using depth map

Fatemeh Taheri,Kambiz Rahbar,Ziaeddin Beheshtifard
DOI: https://doi.org/10.1007/s00371-024-03335-0
IF: 2.835
2024-04-10
The Visual Computer
Abstract:The main challenge of content-based image retrieval systems is the difference between how images are described using algorithms and how humans understand the semantic concepts of an image. To overcome this challenge, many image retrieval methods have focused on scenarios that emphasize important regions of an image. However, losing part of the semantic features of an image is a problem that also exists in these approaches. Therefore, this article introduces a method for image retrieval using the fusion of deep features on a segmented neutrosophic set with the help of the image depth map. By transferring the original image to the neutrosophic domain, the image is decomposed into three levels: true, false, and indeterminate. True and false images have different representations of image brightness. The indeterminate image represents the boundary between the true and false images. It is also a representation of the edges in the image. Convolutional layers of deep neural networks are sensitive to changes in image brightness when extracting feature maps. For this reason, the extracted features from the true and false images are different from each other and can be considered as complementary to each other. In the second step, the image depth map is estimated using a vision transformer. Then the estimated depth map is binarized using a predefined threshold. By applying the binarized depth map to the neutrosophic domain, objects in near and far regions are classified. Effective features of each region are extracted using a pre-trained deep neural network, VGG-16. Important features from each group of images are selected using the Boruta-Shap algorithm. Finally, to reduce redundancy and unify the extracted features, feature fusion is performed in two stages, resulting in the final feature vector for each image. Experimental results confirm that extracting semantic and content features from different regions of an image using the proposed method leads to improved retrieval results and reduces semantic gaps.
computer science, software engineering
What problem does this paper attempt to address?