Abstract:Identifying the reason for which an author cites another work is essential to understand the nature of scientific contributions and to assess their impact. Citations are one of the pillars of scholarly communication and most metrics employed to analyze these conceptual links are based on quantitative observations. Behind the act of referencing another scholarly work there is a whole world of meanings that needs to be proficiently and effectively revealed. This study emphasizes the importance of trustfully classifying citation intents to provide more comprehensive and insightful analyses in research assessment. We address this task by presenting a study utilizing advanced Ensemble Strategies for Citation Intent Classification (CIC) incorporating Language Models (LMs) and employing Explainable AI (XAI) techniques to enhance the interpretability and trustworthiness of models' predictions. Our approach involves two ensemble classifiers that utilize fine-tuned SciBERT and XLNet LMs as baselines. We further demonstrate the critical role of section titles as a feature in improving models' performances. The study also introduces a web application developed with Flask and currently available at <a class="link-external link-http" href="http://137.204.64.4:81/cic/classifier" rel="external noopener nofollow">this http URL</a>, aimed at classifying citation intents. One of our models sets as a new state-of-the-art (SOTA) with an 89.46% Macro-F1 score on the SciCite benchmark. The integration of XAI techniques provides insights into the decision-making processes, highlighting the contributions of individual words for level-0 classifications, and of individual models for the metaclassification. The findings suggest that the inclusion of section titles significantly enhances classification performances in the CIC task. Our contributions provide useful insights for developing more robust datasets and methodologies, thus fostering a deeper understanding of scholarly communication.

CausalCite: A Causal Formulation of Paper Citations

Deep Representation Learning of Scientific Paper Reveals Its Potential Scholarly Impact

The Role of Positive and Negative Citations in Scientific Evaluation

Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting

Beyond Correlation: Towards Matching Strategy for Causal Inference in Information Science.

Leveraging citation influences for Modeling scientific documents

Predicting Long-Term Citations from Short-Term Linguistic Influence

Co-Factor Analysis of Citation Networks

The Causal Strength Bank: A New Benchmark for Causal Strength Classification.

A simulation-based analysis of the impact of rhetorical citations in science

HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction

Why do you cite? An investigation on citation intents and decision-making classification processes

Analysis of the relationships among paper citation and its influencing factors: a Bayesian network-based approach

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Introducing CausalBench: A Flexible Benchmark Framework for Causal Analysis and Machine Learning

Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates

ImpactCite: An XLNet-based method for Citation Impact Analysis

CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text