Video-based Recipe Retrieval

Da Cao,Ning Han,Hao Chen,Xiaochi Wei,Xiangnan He
DOI: https://doi.org/10.1016/j.ins.2019.11.033
IF: 8.1
2019-01-01
Information Sciences
Abstract:Recipe retrieval has received great attention in the research community, which focuses on retrieving a textual recipe given a text or an image as the query. However, cooking is an interesting activity, and many useful elements are hidden in the dynamic videos, which might be omitted in the statistic texts and images. On the other hand, although a number of video-based retrieval methods have been investigated in the past, existing technologies mainly focus on general applications and seldom take the domain-specific feature into account. To bridge the above gap, we investigate a new problem of video-based recipe retrieval, which refers to retrieving a cooking video from a list of video candidates given a textual recipe as the query, or the reverse side. In this work, we first propose a hierarchical attention network to learn the representations of textual recipe and its cooking procedures. Moreover, we employ reinforcement learning to dynamically locate a video moment given a cooking procedure as the query. Thereafter, the representations of video moments and cooking procedures are projected into a common space and optimized with a pairwise ranking loss, which is able to distinguish the matched and unmatched video moment-cooking procedure pairs. Therefore, the retrieval process between cooking videos and textual recipes is performed as the assembling matching results of video moments and cooking procedures. By experimenting on a self-collected dataset, we demonstrate the effectiveness and rationality of our proposed solution on the scope of both overall performance comparison and micro-level analyses.
What problem does this paper attempt to address?