The relation between Pearson's correlation coefficient r and Salton's cosine measure

Leo Egghe,Loet Leydesdorff
DOI: https://doi.org/10.48550/arXiv.0911.1318
2009-11-06
Information Retrieval
Abstract:The relation between Pearson's correlation coefficient and Salton's cosine measure is revealed based on the different possible values of the division of the L1-norm and the L2-norm of a vector. These different values yield a sheaf of increasingly straight lines which form together a cloud of points, being the investigated relation. The theoretical results are tested against the author co-citation relations among 24 informetricians for whom two matrices can be constructed, based on co-citations: the asymmetric occurrence matrix and the symmetric co-citation matrix. Both examples completely confirm the theoretical results. The results enable us to specify an algorithm which provides a threshold value for the cosine above which none of the corresponding Pearson correlations would be negative. Using this threshold value can be expected to optimize the visualization of the vector space.
What problem does this paper attempt to address?