Shallow Synthesis of Knowledge in GPT-Generated Texts: A Case Study in Automatic Related Work Composition

Anna Martin-Boyle,Aahan Tyagi,Marti A. Hearst,Dongyeop Kang
2024-02-20
Abstract:Numerous AI-assisted scholarly applications have been developed to aid different stages of the research process. We present an analysis of AI-assisted scholarly writing generated with ScholaCite, a tool we built that is designed for organizing literature and composing Related Work sections for academic papers. Our evaluation method focuses on the analysis of citation graphs to assess the structural complexity and inter-connectedness of citations in texts and involves a three-way comparison between (1) original human-written texts, (2) purely GPT-generated texts, and (3) human-AI collaborative texts. We find that GPT-4 can generate reasonable coarse-grained citation groupings to support human users in brainstorming, but fails to perform detailed synthesis of related works without human intervention. We suggest that future writing assistant tools should not be used to draft text independently of the human author.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue of evaluating the quality of Related Work sections generated by AI-assisted academic writing tools. Specifically, the authors assess these texts in terms of structural complexity and citation interconnectivity by analyzing citation graphs, and compare AI-generated texts with human-written texts in three aspects: 1) original human-written texts; 2) purely GPT-generated texts; 3) human-AI collaborative texts. The main goal of the study is to quantify the ability of AI texts to synthesize and contextualize relevant literature to determine if they can meet the standards of high-quality academic writing. ### Main Research Questions: 1. **Can AI-generated Related Work sections effectively integrate and contextualize multiple citations as human-written ones do?** 2. **Is the quality of human-AI collaborative Related Work sections between that of purely human-written and purely AI-generated ones?** 3. **How can the quality of AI-generated academic texts be objectively and reproducibly evaluated?** ### Research Background: - **Acceleration of Knowledge Creation**: A notable feature of modern research is the rapid growth of knowledge creation. For example, from 1952 to 2020, the number of scientific publications grew at an annual rate of 5.08%, doubling every 14 years. - **Development of AI-Assisted Tools**: To cope with this growth, many AI-assisted tools have been developed for different stages of the research process, such as literature discovery, reading interfaces, and writing tools. - **Application of ChatGPT**: Scholars have begun to use ChatGPT directly in the academic writing process, but formal evaluations of its output are still rare. ### Research Methods: - **Dataset**: Using award-winning papers from the 2023 ACL conference, simulating Works-in-Progress (WIPs), and extracting citations from their Related Work sections. - **ScholaCite System**: Developed a GPT-4-based tool to organize citations and generate Related Work sections. The tool first groups citations based on thematic similarity and relevance, then generates the text for the Related Work sections. - **Evaluation Methods**: By constructing citation graphs and calculating metrics such as the number of edges, average node degree, density, and clustering coefficient, the quality of Related Work sections generated under different conditions is objectively assessed. ### Main Findings: - **Human-Written Texts Have Higher Citation Interconnectivity**: Human-written texts significantly outperform GPT-generated texts in terms of the number of citations, average node degree, density, and clustering coefficient. - **GPT-Assisted Texts Perform Between Human-Written and GPT-Generated Texts**: GPT-assisted texts approach the level of human-written texts in some metrics but still show gaps in others. - **Effectiveness of Citation Graph Analysis**: Citation graph analysis is an objective, reproducible, and scalable method that can effectively evaluate the quality of AI-generated academic texts. ### Conclusion: - **AI-Generated Texts Perform Well in Rough Citation Grouping** but lack the ability to synthesize related work in detail. - **Future Writing Assistance Tools Should Collaborate with Human Authors** rather than generate texts independently. - **Citation Graph Analysis Provides a New Method for Evaluating AI-Generated Academic Texts**, helping to understand the capabilities and limitations of AI in academic writing.