Cross-modal Fabric Image-Text Retrieval Based on Convolutional Neural Network and TinyBERT

Jun Xiang,Ning Zhang,Ruru Pan
DOI: https://doi.org/10.1007/s11042-023-17903-4
IF: 2.577
2023-01-01
Multimedia Tools and Applications
Abstract:The repaid renewal of fabric products increases the difficulty of retrieving existing products in enterprises. The unimodal retrieval methods can take advantage of historical production experiences, but they cannot meet the user’s variable retrieval requests. Cross-modal image-text retrieval can quickly obtain technical descriptions or intentional images, which is an urgent demand in textile industries. In this paper, a novel cross-modal fabric image-text retrieval is proposed based on the fabric characteristics. A convolutional neural network with a compact structure and cross-domain connection is designed to represent the visual content of the fabric images. Then, the fine-tuned TinyBERT is applied to embed the textual description into a vector. The representations of the two modalities are aligned in the same Hamming space. Finally, a cross-modal retrieval strategy is designed based on the features of different modes. A fabric dataset that contains over 40000 pairs of images and texts is built as the benchmark to verify the proposed cross-modal image-text retrieval method. Extensive experiments have been performed on the built dataset. Results indicate that the proposed scheme is feasible and effective, being superior to the existing methods proposed for public datasets. The proposed method can provide referential assistance for the production crew in the fabric factory.
What problem does this paper attempt to address?