Abstract:The design of a Web search evaluation metric is closely related with how the user's interaction process is modeled. Each behavioral model results in a different metric used to evaluate search performance. In these models and the user behavior assumptions behind them, when a user ends a search session is one of the prime concerns because it is highly related to both benefit and cost estimation. Existing metric design usually adopts some simplified criteria to decide the stopping time point: (1) upper limit for benefit (e.g. RR, AP); (2) upper limit for cost (e.g. Precision@N, DCG@N). However, in many practical search sessions (e.g. exploratory search), the stopping criterion is more complex than the simplified case. Analyzing benefit and cost of actual users' search sessions, we find that the stopping criteria vary with search tasks and are usually combination effects of both benefit and cost factors. Inspired by a popular computer game named Bejeweled, we propose a Bejeweled Player Model (BPM) to simulate users' search interaction processes and evaluate their search performances. In the BPM, a user stops when he/she either has found sufficient useful information or has no more patience to continue. Given this assumption, a new evaluation framework based on upper limits (either fixed or changeable as search proceeds) for both benefit and cost is proposed. We show how to derive a new metric from the framework and demonstrate that it can be adopted to revise traditional metrics like Discounted Cumulative Gain (DCG), Expected Reciprocal Rank (ERR) and Average Precision (AP). To show effectiveness of the proposed framework, we compare it with a number of existing metrics in terms of correlation between user satisfaction and the metrics based on a dataset that collects users' explicit satisfaction feedbacks and assessors' relevance judgements. Experiment results show that the framework is better correlated with user satisfaction feedbacks.

Automatic Search Engine Performance Evaluation With The Wisdom Of Crowds

Automatic Search Engine Evaluation Based On User Behavior Analysis

Automatic Search Engine Performance Evaluation with Click-Through Data Analysis

Automatic Search Engine Performance Evaluation Based on User Behavior Analysis

A Multi-View Semi-Supervised Approach for Task-Level Web Search Success Evaluation.

Meta-evaluation of Online and Offline Web Search Evaluation Metrics

Behavior-based evaluation of session satisfaction

On Annotation Methodologies for Image Search Evaluation

Investigation Of User Search Behavior While Facing Heterogeneous Search Services

Incorporating Query Reformulating Behavior into Web Search Evaluation

Why Don't You Click: Understanding Non-Click Results inWeb Search with Brain Signals

Leveraging Human-AI Collaboration in Crowd-Powered Source Search: A Preliminary Study

Relevance Estimation with Multiple Information Sources on Search Engine Result Pages.

When does Relevance Mean Usefulness and User Satisfaction in Web Search?

Efficiently collecting relevance information from clickthroughs for web retrieval system evaluation.

Satisfaction Prediction of Web Search Users

Evaluating Web Search with a Bejeweled Player Model.

CrowdSeed: query processing on microblogs

Crowd-Selection Query Processing in Crowdsourcing Databases: A Task-Driven Approach.

Evaluation Of Web-Based Search Engines From The End-User'S Perspective: A Pilot Study

AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs