An Analytical Approach to the Jaccard Similarity Index

Gonzalo Travieso,Alexandre Benatti,Luciano da F. Costa
2024-10-22
Abstract:The Jaccard similarity index has often been employed in science and technology as a means to quantify the similarity between two sets. When modified to operate on real-valued values, the Jaccard similarity index can be applied to compare vectors, an operation which plays a central role in visualization, classification, and modeling. The present work aims at developing an analytical approach for estimating the probability density of the Jaccard similarity values as implied by set of data elements characterized by specific statistical densities, with emphasis on the uniform and normal cases. Several theoretical and practical situations can benefit directly from such an approach, as it allows several of the properties of the similarity comparisons among a given dataset to be better understood and anticipated. Situations in which the described approach can be applied include the estimation and visualization of data interrelationships in terms of similarity networks, as well as diverse problems in data analysis, pattern recognition and scientific modeling. In addition to presenting the analytical developments and results, examples are also provided in order to illustrate the potential of the approach. The work also includes extension of the reported developments to modifications of the Jaccard index intended for regularization and control of the sharpness of the implemented comparisons.
Data Analysis, Statistics and Probability
What problem does this paper attempt to address?