Fabric image retrieval based on multi-modal feature fusion
Ning Zhang,Yixin Liu,Zhongjian Li,Jun Xiang,Ruru Pan
DOI: https://doi.org/10.1007/s11760-023-02889-1
IF: 1.583
2024-01-20
Signal Image and Video Processing
Abstract:With the increasing of multi-source heterogeneous data, flexible retrieval across different modalities is an urgent demand in industrial applications. To allow users to control the retrieval results, a novel fabric image retrieval method is proposed in this paper based on multi-modal feature fusion. First, the image feature is extracted using the modified pre-trained convolutional neural network to separate macroscopic and fine-grained features, which are then selected and aggregated by the multi-layer perception. The feature of the modification text is extracted by long short-term memory networks. Subsequently, the two features are fused in a visual-semantic joint embedding space by gated and residual structures to control the selective expression of separable image features. To validate the proposed scheme, a fabric image database for multi-modal retrieval is created as the benchmark. Qualitative and quantitative experiments indicate that the proposed method is practicable and effective, which can be extended to other similar industrial fields, like wood and wallpaper.
engineering, electrical & electronic,imaging science & photographic technology