Large Scale Metric Learning

Zay Maung Maung Aye,Kotagiri Ramamohanarao,Benjamin I. P. Rubinstein
DOI: https://doi.org/10.1109/IJCNN.2016.7727368
2016-01-01
Abstract:Many machine learning and pattern recognition algorithms rely heavily on good distance metrics to achieve competitive performance. While distance metrics can be learned, the computational expense of doing so is currently infeasible on large datasets. In this paper, we propose two efficient-andeffective approaches for selecting the training dataset using Locality-Sensitive Hashing (LSH) with discriminative information, and with K-Means clustering inside LSH buckets, for accelerating metric learning. Our methods yield a speedup factor of (N/C)(2) where N is training set size and C << N is the user-selected compressed set size, achieving quadratic speedup to metric learning often realized as a 1-2 or more orders of magnitude improvement with little degradation to accuracy. For example, our generic filter approach enables the current overall fastest Large Margin Nearest Neighbor (LMNN) to learn metrics on one million samples in 6.8 minutes down from 5.4hrs-a 48x speedup. LMNN and similar state-of-the-art methods use tree data structures to speed up nearest-neighbor queries-an advantage that degrades at higher dimensions. Our approach does not share this limitation.
What problem does this paper attempt to address?