KPML: A Novel Probabilistic Perspective Kernel Mahalanobis Distance Metric Learning Model for Semi-supervised Clustering
Chao Wang,Yongyi Hu,Xiaofeng Gao,Guihai Chen
DOI: https://doi.org/10.1007/978-3-030-59051-2_17
2020-01-01
Abstract:Metric learning aims to transform features of data into another based on some given distance relationships, which may improve the performances of distance-based machine learning models. Most existing methods use the difference between the distance of similar pairs and that of dissimilar pairs as loss functions for training. This kind of loss function may lack interpretability since people can only observe the distance or the difference of the distance, a number with no bounds, but have no idea about how large or small it is. To provide more explanation of these metric learning models, in this paper, we propose the probabilistic theoretical analysis of metric learning, design a special loss function, and propose the Kernelized Probabilistic Metric Learning (KPML) approach. With all the distance values transformed into probabilities, we can, therefore, compare and explain the results of the model. Besides, to effectively make use of both the labeled and unlabeled data to enhance the performance of semi-supervised clustering, we propose a KPML-based approach that leverages metric learning and semi-supervised learning effectively in a novel way. Finally, we use our model to do experiments about kNN-based semi-supervised clustering and the results show that our model significantly outperforms baselines across various datasets.
What problem does this paper attempt to address?