Abstract:The use of distance metrics such as the Euclidean or Manhattan distance for nearest neighbour algorithms allows for interpretation as a geometric model, and it has been widely assumed that the metric axioms are a necessary condition for many data mining tasks. We show that this assumption can in fact be an impediment to producing effective models. We propose to use mass-based dissimilarity, which employs estimates of the probability mass to measure dissimilarity, to replace the distance metric. This substitution effectively converts nearest neighbour (NN) algorithms into Lowest Probability Mass Neighbour (LMN) algorithms. Both types of algorithms employ exactly the same algorithmic procedures, except for the substitution of the dissimilarity measure. We show that LMN algorithms overcome key shortcomings of NN algorithms in classification and clustering tasks. Unlike existing generalised data independent metrics (e.g., quasi-metric, meta-metric, semi-metric, peri-metric) and data dependent metrics, the proposed mass-based dissimilarity is unique because its self-dissimilarity is data dependent and non-constant. • Learning with non-metric proximities: Frank-Michael Schleif Abstract: Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are however often naturally obtained by nonmetric proximity measures which can not easily be handled by classical learning algorithms. In the last years major efforts have been undertaken to provide approaches which can either directly be used for such data or to make standard methods available for these type of data. The presentation provides a comprehensive overview for the field of learning with non-metric proximities. First we introduce the formalism used in non-metric spaces and motivate specific treatments for non-metric proximity data. Secondly we provide a systematization of the various approaches. For a few approaches we discuss complexity issues and generalization properties. We also address the problem of large scale proximity learning which is often overlooked in this context and of major importance to make the method relevant in practice. The discussed algorithms and concepts are in general applicable for proximity based clustering, one-class classification, classification, regression or embedding tasks. Various applications show the relevance of the discussed approaches, which provide a generic framework for multiple input formats. The goal of the presentation Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are however often naturally obtained by nonmetric proximity measures which can not easily be handled by classical learning algorithms. In the last years major efforts have been undertaken to provide approaches which can either directly be used for such data or to make standard methods available for these type of data. The presentation provides a comprehensive overview for the field of learning with non-metric proximities. First we introduce the formalism used in non-metric spaces and motivate specific treatments for non-metric proximity data. Secondly we provide a systematization of the various approaches. For a few approaches we discuss complexity issues and generalization properties. We also address the problem of large scale proximity learning which is often overlooked in this context and of major importance to make the method relevant in practice. The discussed algorithms and concepts are in general applicable for proximity based clustering, one-class classification, classification, regression or embedding tasks. Various applications show the relevance of the discussed approaches, which provide a generic framework for multiple input formats. The goal of the presentation

Mp-Dissimilarity: A Data Dependent Dissimilarity Measure.

Data-dependent Dissimilarity Measure: an Effective Alternative to Geometric Distance Measures

A Comparative Study of Data-Dependent Approaches Without Learning in Measuring Similarities of Data Objects.

Data Dependent Dissimilarity Measures

A Novel Similarity Measure Model for Multivariate Time Series Based on LMNN and DTW

A Novel Structural Mass Based Dissimilarity Measure

Beyond Tf-Idf And Cosine Distance In Documents Dissimilarity Measure

Lazylsh: Approximate Nearest Neighbor Search For Multiple Distance Functions With A Single Index

Overcoming Key Weaknesses of Distance-based Neighbourhood Methods Using a Data Dependent Dissimilarity Measure

Distribution-based Similarity Measures for Multi-Dimensional Point Set Retrieval Applications

Distributed Similarity Queries in Metric Spaces

Exploring Distributional Discrepancy for Multidimensional Point Set Retrieval

Similarity Search: A Matching Based Approach.

A Bi-metric Framework for Fast Similarity Search

Simple Supervised Dissimilarity Measure: Bolstering Iforest-Induced Similarity with Class Information Without Learning.

DIMS: Distributed Index for Similarity Search in Metric Spaces

Matrix dissimilarities based on differences in moments and sparsity

Learning Similarity Measures in Non-Orthogonal Space.

Lowest Probability Mass Neighbour Algorithms: Relaxing the Metric Constraint in Distance-Based Neighbourhood Algorithms

Comparing Apples and Oranges: Measuring Differences between Exploratory Data Mining Results

Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering.