Abstract:It has long been recognized that it is not enough for a Recommender System (RS) to provide recommendations based only on their relevance to users. Among many other criteria, the set of recommendations may need to be diverse. Diversity is one way of handling recommendation uncertainty and ensuring that recommendations offer users a meaningful choice. The literature reports many ways of measuring diversity and improving the diversity of a set of recommendations, most notably by re-ranking and selecting from a larger set of candidate recommendations. Driven by promising insights from the literature on how to incorporate versatile Large Language Models (LLMs) into the RS pipeline, in this paper we show how LLMs can be used for diversity re-ranking. We begin with an informal study that verifies that LLMs can be used for re-ranking tasks and do have some understanding of the concept of item diversity. Then, we design a more rigorous methodology where LLMs are prompted to generate a diverse ranking from a candidate ranking using various prompt templates with different re-ranking instructions in a zero-shot fashion. We conduct comprehensive experiments testing state-of-the-art LLMs from the GPT and Llama families. We compare their re-ranking capabilities with random re-ranking and various traditional re-ranking methods from the literature. We open-source the code of our experiments for reproducibility. Our findings suggest that the trade-offs (in terms of performance and costs, among others) of LLM-based re-rankers are superior to those of random re-rankers but, as yet, inferior to the ones of traditional re-rankers. However, the LLM approach is promising. LLMs exhibit improved performance on many natural language processing and recommendation tasks and lower inference costs. Given these trends, we can expect LLM-based re-ranking to become more competitive soon.

Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers

LLM-enhanced Reranking in Recommender Systems

Enhancing Recommendation Diversity by Re-ranking with Large Language Models

Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages

Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge

Self-Calibrated Listwise Reranking with Large Language Models

MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria

LLM-RankFusion: Mitigating Intrinsic Inconsistency in LLM-based Ranking

A Study of Implicit Ranking Unfairness in Large Language Models

TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy

Benchmarking Linguistic Diversity of Large Language Models

Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task

Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers

PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations

Different Bias Under Different Criteria: Assessing Bias in LLMs with a Fact-Based Approach

Rule-based Data Selection for Large Language Models

LINKAGE: Listwise Ranking among Varied-Quality References for Non-Factoid QA Evaluation via LLMs

LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation