ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs

Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira

2024-02-10

Abstract:ExaRanker recently introduced an approach to training information retrieval (IR) models, incorporating natural language explanations as additional labels. The method addresses the challenge of limited labeled examples, leading to improvements in the effectiveness of IR models. However, the initial results were based on proprietary language models such as GPT-3.5, which posed constraints on dataset size due to its cost and data privacy. In this paper, we introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations. The method has been tested using different LLMs and datasets sizes to better comprehend the effective contribution of data augmentation. Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases. Notably, the data augmentation method proves advantageous even with large datasets, as evidenced by ExaRanker surpassing the target baseline by 0.6 nDCG@10 points in our study. To encourage further advancements by the research community, we have open-sourced both the code and datasets at https://github.com/unicamp-dl/ExaRanker.

Artificial Intelligence,Computation and Language,Information Retrieval

What problem does this paper attempt to address?

The paper attempts to address the issue of insufficient training data annotation in the field of Information Retrieval (IR). Specifically, the authors propose a method to enhance training datasets by using open-source large language models (LLMs) to generate natural language explanations. This approach aims to overcome the cost and data privacy limitations associated with previous research that relied on proprietary language models (such as GPT-3.5), and further validates the continuous improvement in neural ranker performance with the addition of explanations across different dataset sizes. The main contributions of the paper include: 1. **Introduction of ExaRanker-Open**: This is an improved version of ExaRanker based on open-source language models, used to generate natural language explanations to enhance the training data of information retrieval models. 2. **Validation of data augmentation effectiveness**: Experimental results show that adding explanations significantly improves model performance regardless of dataset size, with more pronounced effects when using larger language models. 3. **Public release of code and datasets**: To promote community research progress, the authors have made the code and datasets publicly available. Through these efforts, the paper demonstrates the potential of using open-source language models for data augmentation in the field of information retrieval, providing new directions and tools for future research.

ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs

Evaluating the Explainability of Neural Rankers

OpenXAI: Towards a Transparent Evaluation of Model Explanations

EXS: Explainable Search Using Local Model Agnostic Interpretability

ir_explain: a Python Library of Explainable IR Methods

Unlocking the Potential of Large Language Models for Explainable Recommendations

Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from LLMs

EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods

ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation

Large Language Models as Evaluators for Recommendation Explanations

Trusting deep learning natural-language models via local and global explanations

Explanations Based on Item Response Theory (eXirt): A Model-Specific Method to Explain Tree-Ensemble Model in Trust Perspective

XplainLLM: A Knowledge-Augmented Dataset for Reliable Grounded Explanations in LLMs

OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

RecExplainer: Aligning Large Language Models for Explaining Recommendation Models

From Feature Importance to Natural Language Explanations Using LLMs with RAG

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models

Using Natural Language Explanations to Rescale Human Judgments

XRec: Large Language Models for Explainable Recommendation