MMCRec: Towards Multi-modal Generative AI in Conversational Recommendation

Tendai Mukande,Esraa Ali,Annalina Caputo,Ruihai Dong,Noel E. O'Connor
DOI: https://doi.org/10.1007/978-3-031-56063-7_23
2024-01-01
Abstract:Personalized recommendation systems have become integral in this digital age by facilitating content discovery to users and products tailored to their preferences. Since the Generative Artificial Intelligence (GAI) boom, research into GAI-enhanced Conversational Recommender Systems (CRSs) has sparked great interest. Most existing methods, however, mainly rely on one mode of input such as text, thereby limiting their ability to capture content diversity. This is also inconsistent with real-world scenarios, which involve multi-modal input data and output data. To address these limitations, we propose the Multi-Modal Conversational Recommender System (MMCRec) model which harnesses multiple modalities, including text, images, voice and video to enhance the recommendation performance and experience. Our model is capable of not only accepting multi-mode input, but also generating multi-modal output in conversational recommendation. Experimental evaluations demonstrate the effectiveness of our model in real-world conversational recommendation scenarios.
What problem does this paper attempt to address?