AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews

Keith Tyser,Ben Segev,Gaston Longhitano,Xin-Yu Zhang,Zachary Meeks,Jason Lee,Uday Garg,Nicholas Belsten,Avi Shporer,Madeleine Udell,Dov Te'eni,Iddo Drori
2024-08-20
Abstract:Automatic reviewing helps handle a large volume of papers, provides early feedback and quality control, reduces bias, and allows the analysis of trends. We evaluate the alignment of automatic paper reviews with human reviews using an arena of human preferences by pairwise comparisons. Gathering human preference may be time-consuming; therefore, we also use an LLM to automatically evaluate reviews to increase sample efficiency while reducing bias. In addition to evaluating human and LLM preferences among LLM reviews, we fine-tune an LLM to predict human preferences, predicting which reviews humans will prefer in a head-to-head battle between LLMs. We artificially introduce errors into papers and analyze the LLM's responses to identify limitations, use adaptive review questions, meta prompting, role-playing, integrate visual and textual analysis, use venue-specific reviewing materials, and predict human preferences, improving upon the limitations of the traditional review processes. We make the reviews of publicly available arXiv and open-access Nature journal papers available online, along with a free service which helps authors review and revise their research papers and improve their quality. This work develops proof-of-concept LLM reviewing systems that quickly deliver consistent, high-quality reviews and evaluate their quality. We mitigate the risks of misuse, inflated review scores, overconfident ratings, and skewed score distributions by augmenting the LLM with multiple documents, including the review form, reviewer guide, code of ethics and conduct, area chair guidelines, and previous year statistics, by finding which errors and shortcomings of the paper may be detected by automated reviews, and evaluating pairwise reviewer preferences. This work identifies and addresses the limitations of using LLMs as reviewers and evaluators and enhances the quality of the reviewing process.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced in the academic paper review process. Especially, with the increase in the number of papers, how to conduct large - scale paper reviews efficiently and fairly. Specifically, the paper aims to solve the following key problems by introducing an automatic review system based on artificial intelligence (AI) and large language models (LLM): 1. **Handling a large number of papers**: With the rapid growth of scientific research output, the traditional peer - review process has difficulty dealing with a large number of paper submissions. The automatic review system can help handle more papers and provide early feedback. 2. **Reducing bias**: Human reviewers may be influenced by personal biases during the review process. An AI - driven review system can reduce such biases through standardized evaluation criteria, thereby improving the fairness of the review. 3. **Improving review efficiency and quality**: The automatic review system can quickly generate consistent and high - quality review opinions, help authors improve their papers, and at the same time provide valuable feedback for the academic community. 4. **Trend analysis and cooperation discovery**: Through large - scale automatic review, research trends can be better analyzed, helping researchers find potential cooperation opportunities. 5. **Increasing the community's attention to high - quality papers**: By automatically reviewing a large number of papers and making the review results public, readers can be guided to pay attention to high - quality research results instead of relying solely on popularity or advertising. To achieve these goals, the paper proposes three main AI review systems: - **OpenReviewer**: A platform for automatic peer review that can immediately provide high - quality review feedback to authors. - **Papers with Reviews**: An online platform that collects and makes public papers on arXiv and in open - access Nature journals as well as their review results. - **Reviewer Arena**: A service for evaluating the quality of reviewers by directly and anonymously comparing the review opinions generated by humans and LLM to evaluate their merits and demerits. In addition, the paper also introduces four review evaluation methods, including human evaluation, automatic LLM evaluation, automatic prediction of human preferences, and automatic discovery of LLM review limitations, to ensure the reliability and effectiveness of the review system. Through these systems and methods, the paper hopes to improve the academic review process and make it more efficient, transparent, and fair.