A Fast Diagonal Distance Metric Learning Approach for Large-Scale Datasets

Tie Li,Gang Kou,Yi Peng,Philip S. Yu
DOI: https://doi.org/10.1016/j.ins.2021.04.077
IF: 8.1
2021-01-01
Information Sciences
Abstract:Distance metric learning (DML) aims to learn distance metrics that reflect the interactions between features and labels. Due to the high computational complexity, existing DML models are unsuitable for large-scale datasets. This study proposes a DML approach for large-scale problems by reducing the number of variables, utilizing sparse structures of the optimization problems, and taking advantage of large-scale computation platforms. The proposed approach treats DML as a linear space transformation problem and suggests that a full DML matrix can be approximated by a diagonal matrix in many cases. We solve the diagonal DML problem along with its l(1) and l(2) regularizations via linear and quadratic programming. To facilitate large-scale learning problems, we design a MapReduce framework to build triplets, which are encapsulations of triple data points used for the optimization problem, and develop a weighting mechanism for triplets according to their contributions to the whole distance distortion. Experiments show that the proposed approach is fast in large-scale DML applications with comparable accuracy to much more time-consuming full-matrix models. Since the approach is implemented with the Scala language based on the Spark platform, it can be used directly by productive Java applications, which makes it highly practical for large-scale datasets. (C) 2021 The Authors. Published by Elsevier Inc.
What problem does this paper attempt to address?