How Popular is Your Paper? An Empirical Study of the Citation Distribution

S. Redner
DOI: https://doi.org/10.1007/s100510050359
1998-04-17
Abstract:Numerical data for the distribution of citations are examined for: (i) papers published in 1981 in journals which are catalogued by the Institute for Scientific Information (783,339 papers) and (ii) 20 years of publications in Physical Review D, vols. 11-50 (24,296 papers). A Zipf plot of the number of citations to a given paper versus its citation rank appears to be consistent with a power-law dependence for leading rank papers, with exponent close to -1/2. This, in turn, suggests that the number of papers with x citations, N(x), has a large-x power law decay N(x)~x^{-alpha}, with alpha approximately equal to 3.
Statistical Mechanics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the distribution law of the citation times of scientific papers. Specifically, the author focuses on: 1. **Distribution of citation times**: How to quantify and describe the statistical distribution of the citation times of scientific papers? In particular, can the distribution of citation times be described by some mathematical function (such as power law, stretched exponential, etc.)? 2. **Characteristics of highly - cited papers**: What position do papers with extremely high citation times occupy in the overall distribution? How are their citation patterns different from those of ordinary papers? 3. **Impact of time evolution**: How will the distribution of citation times change over time? What are the differences in citation times between papers published early and those published recently? ### Main research content The author explores these problems by analyzing two large - scale data sets: - **ISI data set**: It contains 783,339 papers published in 1981 and their citation situations as of June 1997. - **PRD data set**: It contains 24,296 papers published in Physical Review D from 1975 - 1994 and their citation situations. ### Key findings 1. **Power - law characteristics of citation distribution**: - Through the analysis of Zipf plots (the relationship between citation times and rankings), the author finds that for highly - cited papers, the relationship between the citation times \( N(x) \) and the citation times \( x \) can be described by the power - law function \( N(x)\sim x^{-\alpha} \), where the power - law index \( \alpha\approx3 \). 2. **Citation differences in different time periods**: - For different time periods of the PRD data set (such as 1975 - 1979 and 1990 - 1994), the distributions of low - cited papers are similar, while the distributions of high - cited papers are significantly different. This indicates that the influence of high - cited papers lasts longer. 3. **Time evolution of citation distribution**: - The distribution of citation times is constantly changing over time, especially the number and citation times of high - cited papers are still increasing. Therefore, the current data may not be sufficient to fully determine the final form of the citation distribution. ### Conclusion This study shows that the distribution of citation times of scientific papers has complex statistical characteristics and cannot be simply described by a single function. However, for highly - cited papers, the distribution of their citation times shows obvious power - law characteristics, which provides a new perspective for understanding academic influence. In addition, the time - evolution characteristics of the citation distribution also suggest that we need longer - term data to accurately describe this phenomenon. Through these studies, the author provides an important quantitative basis for understanding the influence of scientific papers and the mechanisms behind them.