Tran-GCN: Multi-label Pattern Image Retrieval via Transformer Driven Graph Convolutional Network

Ying Li,Chunming Guan,Rui Cai,Ye Erwan,Ding Yuxiang,Jiaquan Gao
DOI: https://doi.org/10.1145/3581783.3612216
2023-01-01
Abstract:Pattern images are artificially designed images that possess distinctiveness in their elements, styles, and arrangements. With the ever-growing number of pattern images, pattern image retrieval emerges as a promising technique with significant potential for commercial and industrial applications, such as fashion and home decoration, facilitating rapid identification of preferred print patterns by users. The main purpose of multi-label pattern image retrieval is to effectively represent and match images with their corresponding labels. Compared to conventional image retrieval, multi-label pattern image retrieval faces greater challenges due to the richer semantic information contained within the abstract print patterns and the complex relationships between multiple labels. To tackle these challenges, we propose a model specifically designed for multi-label pattern image retrieval, called Tran-GCN. Our proposed model is built upon a Transformer-based autoregressive architecture, which leverages image information to guide the exploration of correlations between different labels through the textual modality. By utilizing this correlation information, we construct a graph convolutional network (GCN) model to further enhance the correlations between image and label representations. To be more specific, our Tran-GCN model utilizes a cross-modal attention mechanisms at each layer to effectively aggregate visual features from the input image and update label semantics through residual connections. The GCN module is updated based on the correlation between textual features, as represented in a relationship matrix. Extensive experiments on two widely used public visual benchmarks, MS-COCO and NUS-WIDE, as well as a multi-label pattern image dataset, Pattern 2, consistently demonstrate the ability of our proposed Tran-GCN model for general use and its superior performance in multi-label pattern image retrieval tasks as well.
What problem does this paper attempt to address?