VisualLens: Personalization through Visual History

Wang Bill Zhu,Deqing Fu,Kai Sun,Yi Lu,Zhaojiang Lin,Seungwhan Moon,Kanika Narang,Mustafa Canim,Yue Liu,Anuj Kumar,Xin Luna Dong
2024-11-25
Abstract:We hypothesize that a user's visual history with images reflecting their daily life, offers valuable insights into their interests and preferences, and can be leveraged for personalization. Among the many challenges to achieve this goal, the foremost is the diversity and noises in the visual history, containing images not necessarily related to a recommendation task, not necessarily reflecting the user's interest, or even not necessarily preference-relevant. Existing recommendation systems either rely on task-specific user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. We propose a novel approach, VisualLens, that extracts, filters, and refines image representations, and leverages these signals for personalization. We created two new benchmarks with task-agnostic visual histories, and show that our method improves over state-of-the-art recommendations by 5-10% on Hit@3, and improves over GPT-4o by 2-5%. Our approach paves the way for personalized recommendations in scenarios where traditional methods fail.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper attempts to improve the accuracy of personalized recommendations through users' visual history, that is, the photos taken or shared by users. Specifically, the paper proposes a new method named **VisualLens**, which aims to extract valuable information from users' life photos to better understand users' interests and preferences and provide more personalized recommendations. ### Background and challenges 1. **Diversity and noise**: Users' visual history usually contains a large number of images that are not directly related to specific recommendation tasks, and these images may not reflect users' actual interests or preferences. 2. **Hardware limitations**: Recording users' visual history needs to overcome hardware limitations such as battery life, thermal constraints, and storage capacity. 3. **Real - time response**: Many recommendation tasks require immediate responses, so low - latency solutions need to be developed. ### Solutions 1. **Offline history enhancement**: - **Image encoding**: Use the CLIP ViT - L/14@336px model to encode each image and generate visual embeddings. - **Generate image descriptions**: Use the frozen LLaVA - v1.6 8B model to generate image descriptions, limited to within 30 words to reduce hallucinations. - **Generate aspect words**: Extract the key feature words (aspect words) of the image, such as dome, balcony, plant, etc., which provide important details about the image. 2. **Runtime recommendation generation**: - **History retrieval**: According to the query q and the user's visual history Hu, retrieve the images related to q, and select at most w images to reduce the amount of processing. - **Preference profile generation**: Use the retrieved images, their descriptions, and aspect words to generate the user's preference profile. - **Candidate matching**: Match the user's preference profile with each candidate item to generate a confidence score for each candidate item for ranking. ### Experimental results - **Benchmark datasets**: Two new benchmark datasets, Google Review - V and Yelp - V, were created, which contain task - irrelevant visual history. - **Performance improvement**: VisualLens has a 5 - 10% improvement over the existing best method on the Hit@3 metric, and has a 1.6% and 4.6% improvement over GPT - 4o on the Google Review - V and Yelp - V benchmarks respectively. ### Conclusion VisualLens successfully improves the accuracy of personalized recommendations by leveraging users' visual history. This method opens up new ways to achieve personalized recommendations in scenarios where traditional methods are difficult to be effective.