Abstract:The use of distance metrics such as the Euclidean or Manhattan distance for nearest neighbour algorithms allows for interpretation as a geometric model, and it has been widely assumed that the metric axioms are a necessary condition for many data mining tasks. We show that this assumption can in fact be an impediment to producing effective models. We propose to use mass-based dissimilarity, which employs estimates of the probability mass to measure dissimilarity, to replace the distance metric. This substitution effectively converts nearest neighbour (NN) algorithms into Lowest Probability Mass Neighbour (LMN) algorithms. Both types of algorithms employ exactly the same algorithmic procedures, except for the substitution of the dissimilarity measure. We show that LMN algorithms overcome key shortcomings of NN algorithms in classification and clustering tasks. Unlike existing generalised data independent metrics (e.g., quasi-metric, meta-metric, semi-metric, peri-metric) and data dependent metrics, the proposed mass-based dissimilarity is unique because its self-dissimilarity is data dependent and non-constant. • Learning with non-metric proximities: Frank-Michael Schleif Abstract: Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are however often naturally obtained by nonmetric proximity measures which can not easily be handled by classical learning algorithms. In the last years major efforts have been undertaken to provide approaches which can either directly be used for such data or to make standard methods available for these type of data. The presentation provides a comprehensive overview for the field of learning with non-metric proximities. First we introduce the formalism used in non-metric spaces and motivate specific treatments for non-metric proximity data. Secondly we provide a systematization of the various approaches. For a few approaches we discuss complexity issues and generalization properties. We also address the problem of large scale proximity learning which is often overlooked in this context and of major importance to make the method relevant in practice. The discussed algorithms and concepts are in general applicable for proximity based clustering, one-class classification, classification, regression or embedding tasks. Various applications show the relevance of the discussed approaches, which provide a generic framework for multiple input formats. The goal of the presentation Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are however often naturally obtained by nonmetric proximity measures which can not easily be handled by classical learning algorithms. In the last years major efforts have been undertaken to provide approaches which can either directly be used for such data or to make standard methods available for these type of data. The presentation provides a comprehensive overview for the field of learning with non-metric proximities. First we introduce the formalism used in non-metric spaces and motivate specific treatments for non-metric proximity data. Secondly we provide a systematization of the various approaches. For a few approaches we discuss complexity issues and generalization properties. We also address the problem of large scale proximity learning which is often overlooked in this context and of major importance to make the method relevant in practice. The discussed algorithms and concepts are in general applicable for proximity based clustering, one-class classification, classification, regression or embedding tasks. Various applications show the relevance of the discussed approaches, which provide a generic framework for multiple input formats. The goal of the presentation

A Novel Similarity Metric with Application to Big Process Data Analytics

A Novel Similarity Measure Model for Multivariate Time Series Based on LMNN and DTW

A Novel Similarity Measure Approach for Time Series Based on PLA and DTW

Monitoring and prediction of big process data with deep latent variable models and parallel computing

Neural Network Weight Comparison for Industrial Causality Discovering and Its Soft Sensing Application

Visual Process Monitoring by Data-Dependent Kernel Discriminant Analysis with T-Distributed Similarities.

Measuring Similarity for Data-Aware Business Processes

Novel Multimode Process Soft Sensing Methods Based on the Dynamic Mixture Variational Autoencoder Regression Model

A Novel Statistical-Based Monitoring Approach for Complex Multivariate Processes

A Novel Label-Aware Global Graph Construction Method and Spiking-Coded Graph Neural Network for Intelligent Process Fault Diagnosis

A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance

A New Similarity Space Tailored for Supervised Deep Metric Learning

A Novel Process Monitoring Approach Based On Feature Points Distance Dynamic Autoencoder

Data Dependent Dissimilarity Measures

Deep Metric Learning using Similarities from Nonlinear Rank Approximations

A novel similarity measure framework on financial data mining

Nonlinear Dynamic Process Monitoring Based on Discriminative Denoising Autoencoder and Canonical Variate Analysis

Robust Monitoring and Fault Isolation of Nonlinear Industrial Processes Using Denoising Autoencoder and Elastic Net

Concurrent static and dynamic dissimilarity analytics for fine-scale evaluation of process data distributions

A Dataset Similarity Evaluation Framework for Wireless Communications and Sensing

Dissimilarity Analytics for Monitoring of Nonstationary Industrial Processes with Stationary Subspace Decomposition