Scientific publications clustering using textual and citation information

Nacim Fateh Chikhi
DOI: https://doi.org/10.1016/j.eswa.2024.123319
IF: 8.5
2024-02-09
Expert Systems with Applications
Abstract:Scientific publications clustering has attracted much attention, and many different approaches have been proposed. One of the challenges in scientific documents clustering is how to combine citation and textual information to improve clustering quality. In this paper, we explore the use of the von Mises-Fisher distribution for scientific documents clustering. The von Mises-Fisher distribution is particularly well-suited for the analysis of directional data. More precisely, we propose a multi-view version of the mixture of von Mises-Fisher distributions in which one view corresponds to textual information and the other view corresponds to citation information. The hypothesis underlying our approach is that both text and citation data are directional. To estimate the parameters of the proposed model, we use the Expectation-Maximization algorithm along with deterministic annealing to escape poor local maxima solutions. Experiments on two real world datasets show that our algorithm outperforms baseline algorithms in terms of clustering accuracy.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?