Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference

Najmeh Forouzandehmehr,Nima Farrokhsiar,Ramin Giahi,Evren Korpeoglu,Kannan Achan
2024-09-19
Abstract:Personalized outfit recommendation remains a complex challenge, demanding both fashion compatibility understanding and trend awareness. This paper presents a novel framework that harnesses the expressive power of large language models (LLMs) for this task, mitigating their "black box" and static nature through fine-tuning and direct feedback integration. We bridge the item visual-textual gap in items descriptions by employing image captioning with a Multimodal Large Language Model (MLLM). This enables the LLM to extract style and color characteristics from human-curated fashion images, forming the basis for personalized recommendations. The LLM is efficiently fine-tuned on the open-source Polyvore dataset of curated fashion images, optimizing its ability to recommend stylish outfits. A direct preference mechanism using negative examples is employed to enhance the LLM's decision-making process. This creates a self-enhancing AI feedback loop that continuously refines recommendations in line with seasonal fashion trends. Our framework is evaluated on the Polyvore dataset, demonstrating its effectiveness in two key tasks: fill-in-the-blank, and complementary item retrieval. These evaluations underline the framework's ability to generate stylish, trend-aligned outfit suggestions, continuously improving through direct feedback. The evaluation results demonstrated that our proposed framework significantly outperforms the base LLM, creating more cohesive outfits. The improved performance in these tasks underscores the proposed framework's potential to enhance the shopping experience with accurate suggestions, proving its effectiveness over the vanilla LLM based outfit generation.
Information Retrieval,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of personalized clothing matching recommendations. Specifically, the goal of the research is to create an automated personalized clothing matching recommendation system that not only understands the compatibility between clothing items but also perceives current fashion trends and makes recommendations based on users' personal preferences. To achieve this goal, the paper proposes a novel framework that leverages the powerful expressive capabilities of large-scale language models (LLMs) to accomplish this task, and overcomes the "black box" nature and static characteristics of these models through fine-tuning and direct feedback integration. Additionally, the framework addresses the visual-text gap in item descriptions by using multimodal large language models (MLLMs) for image caption generation, enabling the LLM to extract style and color features from human-curated fashion images, thus laying the foundation for personalized recommendations. The framework is efficiently fine-tuned on the open-source Polyvore dataset to optimize its ability to recommend fashionable clothing and employs a direct preference mechanism to enhance the LLM's decision-making process, forming a self-reinforcing AI feedback loop that continuously improves recommendation results based on seasonal fashion trends. Experimental results show that the framework significantly outperforms the baseline LLM models in key tasks.