Abstract:Despite their satisfactory performance, most existing listwise Learning-To-Rank (LTR) models do not consider the crucial issue of robustness. A data set can be contaminated in various ways, including human error in labeling or annotation, distributional data shift, and malicious adversaries who wish to degrade the algorithm's performance. It has been shown that Distributionally Robust Optimization (DRO) is resilient against various types of noise and perturbations. To fill this gap, we introduce a new listwise LTR model called Distributionally Robust Multi-output Regression Ranking (DRMRR). Different from existing methods, the scoring function of DRMRR was designed as a multivariate mapping from a feature vector to a vector of deviation scores, which captures local context information and cross-document interactions. In this way, we are able to incorporate the LTR metrics into our model. DRMRR uses a Wasserstein DRO framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution defined by a Wasserstein ball. We present a compact and computationally solvable reformulation of the min-max formulation of DRMRR. Our experiments were conducted on two real-world applications: medical document retrieval and drug response prediction, showing that DRMRR notably outperforms state-of-the-art LTR models. We also conducted an extensive analysis to examine the resilience of DRMRR against various types of noise: Gaussian noise, adversarial perturbations, and label poisoning. Accordingly, DRMRR is not only able to achieve significantly better performance than other baselines, but it can maintain a relatively stable performance as more noise is added to the data.

Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval

Towards Scalable and Fast Distributionally Robust Optimization for Data-Driven Deep Learning

DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

Robust Prompt Optimization for Large Language Models Against Distribution Shifts

Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

ScalingNote: Scaling up Retrievers with Large Language Models for Real-World Dense Retrieval

DORO: Distributional and Outlier Robust Optimization

Disentangled Modeling of Domain and Relevance for Adaptable Dense Retrieval

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment

Leveraging LLMs for Unsupervised Dense Retriever Ranking

Task-Distributionally Robust Data-Free Meta-Learning.

Finetuning Large Language Model for Personalized Ranking

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

LLM-Oriented Retrieval Tuner

Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models

Distributionally Robust Learning-to-rank under the Wasserstein Metric.

Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining