Cross-Modal Recipe Retrieval with Self-Attention Mechanism

LIN Yang,CHU Xu,WANG Yasha,MAO Weijia,ZHAO Junfeng
DOI: https://doi.org/10.3778/j.issn.1673-9418.1912016
2020-01-01
Abstract:Tracking food intake is a key point for diet management. To simplify the recording process, researchers have proposed recipe retrieval technology based on food pictures. The corresponding recipes are retrieved from the food pictures taken and then nutrient information can be inferred accordingly, thereby improving convenience of dietary recording. Recipe retrieval is a typically cross-modal retrieval problem, but when compared with general problems, its major difficulty is that instead of describing visible features in food pictures, recipes provide the procedure of how ingredients become final dish, and that requires the model to better understand the cooking process of the ingredients. However, current works employ traditional models sequentially to deal with text and thus fail to capture distant dependencies in the cooking process. To tackle the problem, this paper proposes a cross-modal recipe retrieval model based on self-attention mechanism. This paper employs the self-attention mechanism in the Transformer model to capture distant dependencies in recipes and it improves the attention mechanism used in traditional work, which enables this model to better capture the semantic information in recipes. Experimental results show that this model outperforms the baselines by 22% on recall rate of recipe retrieval task.
What problem does this paper attempt to address?