RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance

Paulo Henrique Couto,Quang Phuoc Ho,Nageeta Kumari,Benedictus Kent Rachmat,Thanh Gia Hieu Khuong,Ihsan Ullah,Lisheng Sun-Hosoya

2024-06-13

Abstract:Recent advancements in Artificial Intelligence (AI), particularly the widespread adoption of Large Language Models (LLMs), have significantly enhanced text analysis capabilities. This technological evolution offers considerable promise for automating the review of scientific papers, a task traditionally managed through peer review by fellow researchers. Despite its critical role in maintaining research quality, the conventional peer-review process is often slow and subject to biases, potentially impeding the swift propagation of scientific knowledge. In this paper, we propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem, aimed at assessing the relevance of a paper in relation to a specified prompt, analogous to a "call for papers". To address this, we introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt. The objective is to develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one. We explore various baseline approaches, including traditional ML classifiers like Support Vector Machine (SVM) and advanced language models such as BERT. Preliminary findings indicate that the BERT-based end-to-end classifier surpasses other conventional ML methods in performance. We present this problem as a public challenge to foster engagement and interest in this area of research.

Computation and Language,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the challenges present in the scientific paper review process, particularly the issues of slow speed and potential bias in traditional peer review. The authors propose an automated system called RelevAI-Reviewer, which treats the paper review task as a classification problem to evaluate the relevance of a paper to a specific prompt (similar to a "call for papers"). Specifically, the goal of the system is to use machine learning models to determine the relevance of candidate papers and identify the most relevant ones. To achieve this goal, the researchers constructed a new dataset containing 25,164 instances, each including a prompt and four candidate papers with varying degrees of relevance. They also explored various baseline methods, including traditional machine learning classifiers such as Support Vector Machines (SVM) and advanced language models like BERT. The study found that the end-to-end classifier based on BERT outperformed other traditional machine learning methods in terms of performance. Additionally, the paper presents this problem as an open challenge to foster research interest and technological development in this field. By doing so, the research team hopes to attract more researchers to participate and collectively advance the automation and intelligence of the scientific paper review process.

RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance

Can We Automate Scientific Reviewing?

AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews

Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review

The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates

LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Automatic Large Language Model Evaluation Via Peer Review

A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look

A Semi-Automated Solution Approach Recommender for a Given Use Case: a Case Study for AI/ML in Oncology via Scopus and OpenAI

An Ontology Based Measurement for Manuscript—Reviewer Relevance

AAAR-1.0: Assessing AI's Potential to Assist Research

Human-in-the-Loop AI Reviewing: Feasibility, Opportunities, and Risks

Peer review analyze: A novel benchmark resource for computational analysis of peer reviews

PRE: A Peer Review Based Large Language Model Evaluator

Artificial intelligence for literature reviews: opportunities and challenges

Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis

APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

What Can Natural Language Processing Do for Peer Review?

Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

Best in Tau@LLMJudge: Criteria-Based Relevance Evaluation with Llama3