Abstract:Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged. In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.

Inconsistency Ranking-based Noisy Label Detection for High-quality Data

A Label Noise Robust Stacked Auto-Encoder Algorithm for Inaccurate Supervised Classification Problems

Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective

Improving Speaker Verification with Noise-Aware Label Ensembling and Sample Selection: Learning and Correcting Noisy Speaker Labels

CEC: A Noisy Label Detection Method for Speaker Recognition

Learning with Noisy Labels Via Self-supervised Adversarial Noisy Masking

Robust Training for Speaker Verification Against Noisy Labels

Learning to Rank from Noisy Data

ENLD: Efficient Noisy Label Detection for Incremental Datasets in Data Lake.

RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers

NoiseRank: Unsupervised Label Noise Reduction with Dependence Models

NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification

Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations

Noisy Label Processing for Classification: A Survey

Countering Noisy Labels by Learning from Auxiliary Clean Labels

Remote Sensing Image Scene Classification with Noisy Label Distillation

Learning With Noisy Labels Over Imbalanced Subpopulations

DAT: Training Deep Networks Robust to Label-Noise by Matching the Feature Distributions

BPT-PLR: A Balanced Partitioning and Training Framework with Pseudo-Label Relaxed Contrastive Loss for Noisy Label Learning

Learning from Noisy Labels with Coarse-to-Fine Sample Credibility Modeling