Interpolating between the Jaccard distance and an analogue of the normalized information distance
Bjørn Kjos-Hanssen
DOI: https://doi.org/10.1093/logcom/exac069
2022-11-24
Journal of Logic and Computation
Abstract:Abstract Jiménez, Becerra and Gelbukh (2013) defined a family of ‘symmetric Tversky ratio models’ $S_{\alpha ,\beta }$, $0\le \alpha \le 1$, $\beta>0$. Each function $D_{\alpha ,\beta }=1-S_{\alpha ,\beta }$ is a semimetric on the powerset of a given finite set. We show that $D_{\alpha ,\beta }$ is a metric if and only if $0\le \alpha \le \frac 12$ and $\beta \ge 1/(1-\alpha )$. This result is formally verified in the Lean proof assistant. The extreme points of this parametrized space of metrics are $\mathcal V_1=D_{1/2,2}$, the Jaccard distance and $\mathcal V_{\infty }=D_{0,1}$, an analogue of the normalized information distance of M. Li, Chen, X. Li, Ma and Vitányi (2004). As a second interpolation, in general, we also show that $\mathcal V_p$ is a metric, $1\le p\le \infty $, where $$ \begin{align*} & \varDelta_p(A,B)=(\lvert{B\setminus A}\rvert^p+\lvert{A\setminus B}\rvert^p)^{1/p}, \end{align*}$$$$ \begin{align*} & \mathcal V_p(A,B)=\frac{\varDelta_p(A,B)}{\lvert{A\cap B}\rvert + \varDelta_p(A,B)}. \end{align*}$$
computer science, theory & methods,logic