Abstract:We consider the problem of sequential evaluation, in which an evaluator observes candidates in a sequence and assigns scores to these candidates in an online, irrevocable fashion. Motivated by the psychology literature that has studied sequential bias in such settings -- namely, dependencies between the evaluation outcome and the order in which the candidates appear -- we propose a natural model for the evaluator's rating process that captures the lack of calibration inherent to such a task. We conduct crowdsourcing experiments to demonstrate various facets of our model. We then proceed to study how to correct sequential bias under our model by posing this as a statistical inference problem. We propose a near-linear time, online algorithm for this task and prove guarantees in terms of two canonical ranking metrics. We also prove that our algorithm is information theoretically optimal, by establishing matching lower bounds in both metrics. Finally, we perform a host of numerical experiments to show that our algorithm often outperforms the de facto method of using the rankings induced by the reported scores, both in simulation and on the crowdsourcing data that we collected.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of **Sequential Bias in sequential evaluation**. Specifically, the author focuses on how the scores given by evaluators to candidates are affected by the order in which the candidates appear during the sequential evaluation process. This bias can lead to unfair or inaccurate evaluation results, especially in high - risk application scenarios such as sports competitions and court judgments. #### 1. **Research Background** In many evaluation scenarios, evaluators need to score candidates in sequence, and these scores are irrevocable. For example, in sports competitions, referees immediately give scores after each athlete's performance, and these scores affect the final ranking. Research shows that a candidate's position in the sequence has a significant impact on their score, namely **sequential bias**. This bias may put candidates who appear early at a disadvantage, while candidates who appear later may receive higher scores. #### 2. **Research Motivation** The existence of sequential bias has been supported by empirical research in multiple fields, such as figure skating, gymnastics, diving, and synchronized swimming in sports. In these competitions, sequential bias can lead to unfairness in scoring, which in turn affects the athletes' careers. Therefore, correcting sequential bias is crucial for ensuring the fairness and accuracy of evaluation. #### 3. **Research Objectives** The author proposes a new model to describe the scoring process in sequential evaluation and designs an algorithm to correct sequential bias. Specifically, the author's objectives include: - **Modeling sequential bias**: Propose a mathematical model that can capture the law of score changes in sequential evaluation. - **Correcting sequential bias**: Develop an effective algorithm that can estimate the true ranking of candidates in the presence of noise. - **Verifying the effectiveness of the model**: Prove the effectiveness of the proposed model and algorithm through experimental and theoretical analysis. #### 4. **Main Contributions** - **Modeling method**: The author proposes a general scoring model that takes into account the influence of a candidate's position and relative ranking on the score. - **Algorithm design**: Propose an online algorithm with near - linear time complexity to correct sequential bias and prove the optimality of this algorithm under two classical ranking indicators. - **Experimental verification**: Verify the effectiveness of the model and algorithm through crowdsourcing experiments and numerical simulations, indicating that this algorithm is superior to existing scoring methods in practical applications. In conclusion, this paper is committed to understanding and correcting sequential bias in sequential evaluation through modeling and algorithm design, thereby improving the fairness and accuracy of evaluation.

Modeling and Correcting Bias in Sequential Evaluation

Debiasing Evaluations That are Biased by Evaluations

Bias-aware ranking from pairwise comparisons

Bias in Evaluation Processes: An Optimization-Based Model

Correcting the User Feedback-Loop Bias for Recommendation Systems

Rethinking the Evaluation of Unbiased Scene Graph Generation

Mitigating the Bias of Large Language Model Evaluation

OffsetBias: Leveraging Debiased Data for Tuning Evaluators

CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges

Discovering Bias in Latent Space: An Unsupervised Debiasing Approach

Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility

De-biasing "bias" measurement

Sequential Voting Promotes Collective Discovery in Social Recommendation Systems

Efficient Lifelong Model Evaluation in an Era of Rapid Progress

Unbiased Sequential Recommendation with Latent Confounders

Measuring Recency Bias In Sequential Recommendation Systems

Large Language Models are not Fair Evaluators

Unbiased Comparative Evaluation of Ranking Functions

Unbiased Learning-to-Rank with Biased Feedback

Combating Unknown Bias with Effective Bias-Conflicting Scoring and Gradient Alignment

Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations