The Normalization of Occurrence and Co-occurrence Matrices in Bibliometrics Using Cosine Similarities and Ochiai Coefficients

Qiuju Zhou,Loet Leydesdorff
DOI: https://doi.org/10.1002/asi.23603
2015-01-01
Journal of the Association for Information Science and Technology
Abstract:We prove that Ochiai similarity of the co-occurrence matrix is equal to cosine similarity in the underlying occurrence matrix. Neither the cosine nor the Pearson correlation should be used for the normalization of co-occurrence matrices because the similarity is then normalized twice, and therefore overestimated; the Ochiai coefficient can be used instead. Results are shown using a small matrix 5 cases, 4 variables for didactic reasons, and also Ahlgren etal.'s 2003 co-occurrence matrix of 24 authors in library and information sciences. The overestimation is shown numerically and will be illustrated using multidimensional scaling and cluster dendograms. If the occurrence matrix is not available such as in internet research or author cocitation analysis using Ochiai for the normalization is preferable to using the cosine.
What problem does this paper attempt to address?