Multimodal Label Relevance Ranking via Reinforcement Learning

Taian Guo,Taolin Zhang,Haoqian Wu,Hanjun Li,Ruizhi Qiao,Xing Sun
2024-07-18
Abstract:Conventional multi-label recognition methods often focus on label confidence, frequently overlooking the pivotal role of partial order relations consistent with human preference. To resolve these issues, we introduce a novel method for multimodal label relevance ranking, named Label Relevance Ranking with Proximal Policy Optimization (LR\textsuperscript{2}PPO), which effectively discerns partial order relations among labels. LR\textsuperscript{2}PPO first utilizes partial order pairs in the target domain to train a reward model, which aims to capture human preference intrinsic to the specific scenario. Furthermore, we meticulously design state representation and a policy loss tailored for ranking tasks, enabling LR\textsuperscript{2}PPO to boost the performance of label relevance ranking model and largely reduce the requirement of partial order annotation for transferring to new scenes. To assist in the evaluation of our approach and similar methods, we further propose a novel benchmark dataset, LRMovieNet, featuring multimodal labels and their corresponding partial order data. Extensive experiments demonstrate that our LR\textsuperscript{2}PPO algorithm achieves state-of-the-art performance, proving its effectiveness in addressing the multimodal label relevance ranking problem. Codes and the proposed LRMovieNet dataset are publicly available at \url{<a class="link-external link-https" href="https://github.com/ChazzyGordon/LR2PPO" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is Multimodal Label Relevance Ranking. Specifically, traditional multi - label recognition methods usually focus on label confidence and ignore the partial order relationships that are consistent with human preferences. To make up for this deficiency, the author proposes a new method - Label Relevance Ranking with Proximal Policy Optimization (LR2PPO), aiming to effectively distinguish the partial order relationships between labels and make the model more in line with human preferences. ### Core of the problem 1. **Label confidence vs label relevance**: Traditional methods mainly focus on the confidence of labels (i.e., the probability of a certain label appearing on a given input), but ignore the relevance of labels to the input data. For example, in a movie scene, the label "person" may have a high confidence, but the label "flirt" may be more in line with the theme of the scene and thus has a higher relevance. 2. **Partial order relationships**: When humans understand multimodal data, they often perform partial ordering on labels to reflect their importance or priority. Existing methods fail to fully capture such partial order relationships. 3. **Transfer learning**: How to transfer the ability of label relevance ranking from existing scenarios to new scenarios is also a challenge. New scenarios may contain new labels or video clips, and the cost of annotating these data is high. ### Solutions To solve the above problems, the author proposes the LR2PPO algorithm, and its main contributions include: 1. **Introducing partial order relationships**: By using partial order pairs in the target domain to train the reward model, the model can better capture the inherent characteristics of human preferences. 2. **Designing state representation and policy loss function suitable for ranking tasks**: By redefining the state and policy loss function, LR2PPO can effectively mine the partial order relationships between labels, thereby improving the performance of label relevance ranking. 3. **Creating a benchmark dataset**: To evaluate the effect of LR2PPO, the author constructs a new multimodal label relevance ranking benchmark dataset - LRMovieNet, which contains rich multimodal labels and their corresponding partial order data. ### Method overview - **Stage 1**: Train a basic label relevance ranking model (Actor) on the source domain. - **Stage 2**: Use a small number of partial order pair annotations in the target domain and enhanced partial order pair samples from the source domain to train a reward model (Reward Model). - **Stage 3**: Jointly train the Actor and Critic models, and use the reward model to guide the Actor network to learn the partial order relationships in the target domain. In this way, LR2PPO not only improves the performance of label relevance ranking, but also reduces the need for partial order annotations, making the model easier to transfer to new scenarios. ### Experimental results The experimental results show that LR2PPO is significantly superior to existing methods in multiple indicators, especially when dealing with multimodal data. This proves the effectiveness of this method in solving the multimodal label relevance ranking problem. ### Summary This paper successfully solves the multimodal label relevance ranking problem by introducing partial order relationships and reinforcement learning methods, and provides new ideas and tools for future research.