Abstract:LLM-as-a-judge models have been used for evaluating both human and AI generated content, specifically by providing scores and rationales. Rationales, in addition to increasing transparency, help models learn to calibrate its judgments. Enhancing a model's rationale can therefore improve its calibration abilities and ultimately the ability to score content. We introduce Self-Rationalization, an iterative process of improving the rationales for the judge models, which consequently improves the score for fine-grained customizable scoring criteria (i.e., likert-scale scoring with arbitrary evaluation criteria). Self-rationalization works by having the model generate multiple judgments with rationales for the same input, curating a preference pair dataset from its own judgements, and iteratively fine-tuning the judge via DPO. Intuitively, this approach allows the judge model to self-improve by learning from its own rationales, leading to better alignment and evaluation accuracy. After just two iterations -- while only relying on examples in the training set -- human evaluation shows that our judge model learns to produce higher quality rationales, with a win rate of $62\%$ on average compared to models just trained via SFT on rationale . This judge model also achieves high scoring accuracy on BigGen Bench and Reward Bench, outperforming even bigger sized models trained using SFT with rationale, self-consistency or best-of-$N$ sampling by $3\%$ to $9\%$.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the ability of large - language models (LLMs) as judges, especially in providing scores and reasons. Specifically, the paper introduces a method named "Self - Rationalization". By iteratively generating multiple judgments and their reasons, then creating a preference - pair data set from these judgments, and using the direct preference optimization (DPO) technique to fine - tune the model, the scoring ability of the model under fine - grained custom scoring criteria and the quality of reasons are improved. ### Main Problems 1. **Improving Scoring Accuracy**: Existing LLMs - as - judge models have deficiencies in scoring accuracy, especially when dealing with tasks that require fine - grained scoring criteria. 2. **Enhancing Reason Quality**: Reasons generated by existing models are often not detailed or accurate enough, which affects the transparency and credibility of the models. 3. **Reducing Dependence on Human - Annotated Data**: Traditional training methods rely on a large amount of human - annotated data, which is costly and difficult to scale in practical applications. ### Solutions The paper proposes a new training method - "Self - Rationalization", and the specific steps are as follows: 1. **Seed Initialization**: Start from an initial model of supervised fine - tuning (JSFT), which has been trained on the initial annotated data set. 2. **Self - Rationalization**: For each input, generate multiple judgments and their reasons, and each judgment contains a score and a reason. 3. **Preference Data Organization**: Select high - quality judgments and low - quality judgments from the generated multiple judgments to form a preference - pair data set. 4. **Preference Optimization**: Use the direct preference optimization (DPO) technique to fine - tune the model to improve its ability to generate high - quality reasons and accurate scores. ### Experimental Results - **Performance Improvement**: After two iterations of self - rationalization, the model shows significant performance improvement in multiple benchmark tests, especially in fine - grained scoring tasks. - **Reason Quality**: Human evaluation shows that the model after self - rationalization generates reasons of higher quality, with a winning rate of 62%. - **Resource Efficiency**: Compared with the traditional supervised fine - tuning (SFT) method, the self - rationalization method requires fewer training samples and computing resources and has a faster convergence speed. ### Conclusion Through the self - rationalization method, the paper successfully improves the performance of LLMs - as - judge models in fine - grained scoring tasks, especially in generating high - quality reasons. This method not only improves the scoring accuracy of the model but also reduces the dependence on human - annotated data, and has important practical application value.

Self-rationalization improves LLM as a fine-grained judge

Tailoring Self-Rationalizers with Multi-Reward Distillation

Self-Taught Evaluators

Evaluating Human Alignment and Model Faithfulness of LLM Rationale

Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation

From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks

Self-Judge: Selective Instruction Following with Alignment Self-Evaluation

Towards Interactivity and Interpretability: A Rationale-based Legal Judgment Prediction Framework.

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

JudgeBench: A Benchmark for Evaluating LLM-based Judges

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Improving Language Model Reasoning with Self-motivated Learning

Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs

An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Model is not a General Substitute for GPT-4

Self-Generated Critiques Boost Reward Modeling for Language Models

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Learning to Reason via Self-Iterative Process Feedback for Small Language Models

Enhancing the Rationale-Input Alignment for Self-explaining Rationalization