Efficient low-rank multi-component fusion with component-specific factors in image-recipe retrieval

Wenyu Zhao,Dong Zhou,Buqing Cao,Kai Zhang,Jinjun Chen
DOI: https://doi.org/10.1007/s11042-023-15819-7
IF: 2.577
2023-05-19
Multimedia Tools and Applications
Abstract:Image-Recipe retrieval is the task of retrieving closely related recipes from a collection given a food image and vice versa. The modality gap between images and recipes makes it a challenging task. Recent studies usually focus on learning consistent image and recipe representations to bridge the semantic gap. Though the existing methods have significantly improved image-recipe retrieval, several challenges still remain: 1) Previous studies usually directly concatenate the textual embeddings of different recipe components to generate recipe presentations. Only simple interactions rather than complex interactions are considered. 2) They commonly focus on textual feature extraction from recipes. The methods to extract image features are relatively simple, and most studies utilize the ResNet-50 model. 3) Apart from the retrieval learning loss (triplet loss, for example), several auxiliary loss functions (such as adversarial loss and reconstruction loss) are commonly used to match the recipe and image representations. To deal with these issues, we introduce a novel Low-rank Multi-component Fusion method with Component-Specific Factors (LMF-CSF) to model the different textual components in a recipe for producing superior textual representations. Furthermore, try to pay some attention to image feature extraction. A visual transformer is used to learn better image representations. Then the enhanced representations from two modalities are directly fed into a triplet loss function for image-recipe retrieval learning. Experimental results conducted on the Recipe1M dataset indicate that our LMF-CSF method can outperform the current state-of-the-art image-recipe retrieval baselines.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?