Abstract:Conventional multi-label recognition methods often focus on label confidence, frequently overlooking the pivotal role of partial order relations consistent with human preference. To resolve these issues, we introduce a novel method for multimodal label relevance ranking, named Label Relevance Ranking with Proximal Policy Optimization (LR\textsuperscript{2}PPO), which effectively discerns partial order relations among labels. LR\textsuperscript{2}PPO first utilizes partial order pairs in the target domain to train a reward model, which aims to capture human preference intrinsic to the specific scenario. Furthermore, we meticulously design state representation and a policy loss tailored for ranking tasks, enabling LR\textsuperscript{2}PPO to boost the performance of label relevance ranking model and largely reduce the requirement of partial order annotation for transferring to new scenes. To assist in the evaluation of our approach and similar methods, we further propose a novel benchmark dataset, LRMovieNet, featuring multimodal labels and their corresponding partial order data. Extensive experiments demonstrate that our LR\textsuperscript{2}PPO algorithm achieves state-of-the-art performance, proving its effectiveness in addressing the multimodal label relevance ranking problem. Codes and the proposed LRMovieNet dataset are publicly available at \url{<a class="link-external link-https" href="https://github.com/ChazzyGordon/LR2PPO" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is Multimodal Label Relevance Ranking. Specifically, traditional multi - label recognition methods usually focus on label confidence and ignore the partial order relationships that are consistent with human preferences. To make up for this deficiency, the author proposes a new method - Label Relevance Ranking with Proximal Policy Optimization (LR2PPO), aiming to effectively distinguish the partial order relationships between labels and make the model more in line with human preferences. ### Core of the problem 1. **Label confidence vs label relevance**: Traditional methods mainly focus on the confidence of labels (i.e., the probability of a certain label appearing on a given input), but ignore the relevance of labels to the input data. For example, in a movie scene, the label "person" may have a high confidence, but the label "flirt" may be more in line with the theme of the scene and thus has a higher relevance. 2. **Partial order relationships**: When humans understand multimodal data, they often perform partial ordering on labels to reflect their importance or priority. Existing methods fail to fully capture such partial order relationships. 3. **Transfer learning**: How to transfer the ability of label relevance ranking from existing scenarios to new scenarios is also a challenge. New scenarios may contain new labels or video clips, and the cost of annotating these data is high. ### Solutions To solve the above problems, the author proposes the LR2PPO algorithm, and its main contributions include: 1. **Introducing partial order relationships**: By using partial order pairs in the target domain to train the reward model, the model can better capture the inherent characteristics of human preferences. 2. **Designing state representation and policy loss function suitable for ranking tasks**: By redefining the state and policy loss function, LR2PPO can effectively mine the partial order relationships between labels, thereby improving the performance of label relevance ranking. 3. **Creating a benchmark dataset**: To evaluate the effect of LR2PPO, the author constructs a new multimodal label relevance ranking benchmark dataset - LRMovieNet, which contains rich multimodal labels and their corresponding partial order data. ### Method overview - **Stage 1**: Train a basic label relevance ranking model (Actor) on the source domain. - **Stage 2**: Use a small number of partial order pair annotations in the target domain and enhanced partial order pair samples from the source domain to train a reward model (Reward Model). - **Stage 3**: Jointly train the Actor and Critic models, and use the reward model to guide the Actor network to learn the partial order relationships in the target domain. In this way, LR2PPO not only improves the performance of label relevance ranking, but also reduces the need for partial order annotations, making the model easier to transfer to new scenarios. ### Experimental results The experimental results show that LR2PPO is significantly superior to existing methods in multiple indicators, especially when dealing with multimodal data. This proves the effectiveness of this method in solving the multimodal label relevance ranking problem. ### Summary This paper successfully solves the multimodal label relevance ranking problem by introducing partial order relationships and reinforcement learning methods, and provides new ideas and tools for future research.

Multimodal Label Relevance Ranking via Reinforcement Learning

Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification

Partial Multi-label Learning with Label and Feature Collaboration

Reinforcement Learning to Rank with Coarse-grained Labels

LiPO: Listwise Preference Optimization through Learning-to-Rank

Joint Ranking SVM and Binary Relevance with Robust Low-Rank Learning for Multi-Label Classification

RoMo: Robust Unsupervised Multimodal Learning with Noisy Pseudo Labels

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model

Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels

Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Reinforced Labels: Multi-Agent Deep Reinforcement Learning for Point-Feature Label Placement

Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models

Robust Multi-Label Learning with PRO Loss

Automated Multi-level Preference for MLLMs

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Partial Multi-Label Learning via robust feature selection and relevance fusion optimization