Abstract:Large language models (LLMs) have transformed human writing by enhancing grammar correction, content expansion, and stylistic refinement. However, their widespread use raises concerns about authorship, originality, and ethics, even potentially threatening scholarly integrity. Existing detection methods, which mainly rely on single-feature analysis and binary classification, often fail to effectively identify LLM-generated text in academic contexts. To address these challenges, we propose a novel Multi-level Fine-grained Detection (MFD) framework that detects LLM-generated text by integrating low-level structural, high-level semantic, and deep-level linguistic features, while conducting sentence-level evaluations of lexicon, grammar, and syntax for comprehensive analysis. To improve detection of subtle differences in LLM-generated text and enhance robustness against paraphrasing, we apply two mainstream evasion techniques to rewrite the text. These variations, along with original texts, are used to train a text encoder via contrastive learning, extracting high-level semantic features of sentence to boost detection generalization. Furthermore, we leverage advanced LLM to analyze the entire text and extract deep-level linguistic features, enhancing the model's ability to capture complex patterns and nuances while effectively incorporating contextual information. Extensive experiments on public datasets show that the MFD model outperforms existing methods, achieving an MAE of 0.1346 and an accuracy of 88.56%. Our research provides institutions and publishers with an effective mechanism to detect LLM-generated text, mitigating risks of compromised authorship. Educators and editors can use the model's predictions to refine verification and plagiarism prevention protocols, ensuring adherence to standards.

Structuring Authenticity Assessments on Historical Documents using LLMs

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity

Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

If the Sources Could Talk: Evaluating Large Language Models for Research Assistance in History

Sui Generis: Large Language Models for Authorship Attribution and Verification in Latin

Enhancing Answer Attribution for Faithful Text Generation with Large Language Models

Attribute or Abstain: Large Language Models as Long Document Assistants

Under the Surface: Tracking the Artifactuality of LLM-Generated Data

Automatic Identification of Types of Alterations in Historical Manuscripts

TruthReader: Towards Trustworthy Document Assistant Chatbot with Reliable Attribution

A Survey on Automatic Credibility Assessment of Textual Credibility Signals in the Era of Large Language Models

I'm Spartacus, No, I'm Spartacus: Measuring and Understanding LLM Identity Confusion

Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs

Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework

Latin writing styles analysis with Machine Learning: New approach to old questions

From PDFs to Structured Data: Utilizing LLM Analysis in Sports Database Management

$\forall$uto$\exists$val: Autonomous Assessment of LLMs in Formal Synthesis and Interpretation Tasks

Moving Beyond ChatGPT: Local Large Language Models (LLMs) and the Secure Analysis of Confidential Unstructured Text Data in Social Work Research

Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges

Handwriting Identification of Short Historical Manuscripts