LLM-MHR: A LLM-Augmented Multimodal Hashtag Recommendation Algorithm

Zhijie Tan,Yuzhi Li,Xiang Yuan,Shengwei Meng,Weiping Li,Tong Mo
DOI: https://doi.org/10.1109/icws62655.2024.00100
2024-01-01
Abstract:The recommendation of suitable hashtags for mi-croposts encompassing multimodal content stands as a pivotal challenge for numerous Social Networking Service (SNS) applications such as Instagram, Weibo, etc. The accuracy of multimodal hashtag recommendation algorithms relies heavily on the comprehension of multimodal information, user historical information, and the reasoning ability based on such information. However, most previous works have not effectively utilized both historical and additional information simultaneously. Large Language Models (LLMs) learn a vast amount of implicit knowledge during the pre-training stage, which can serve as potential knowledge bases while also possessing strong reasoning abilities. Therefore, LLMs can provide additional information to help understand the micropost content and infer suitable hashtags with strong reasoning ability. However, introducing LLMs for multimodal hashtag recommendation faces three main challenges. Firstly, LLMs require an efficient modality alignment module to accept a multimodal input. Secondly, LLMs are highly sensitive to input order, while utilizing user historical information requires accepting multiple historical samples, necessitating the design of a robust historical information processing module to eliminate the influence of input order. Thirdly, fine-tuning LLMs entails substantial computational overheads, necessitating the reduction of additional trainable parameters. To address the first two challenges, this paper designs an efficient modality alignment module capable of processing multiple historical samples, simultaneously addressing the sensitivity of LLMs to input order changes. To tackle the third challenge, a hybrid prompt learning approach utilizing both soft and hard prompts is proposed to achieve parameter-efficient fine-tuning of LLMs. Finally, a LLM-augmented Multimodal Hashtag Recommendation algorithm (LLM-MHR) is implemented. Comprehensive experiments on the representative dataset MACON demonstrate that LLM-MHR has achieved SOTA performances with significant improvements.
What problem does this paper attempt to address?