Abstract:In large-scale machine learning, of central interest is the problem of approximate nearest neighbor (ANN) search, where the goal is to query particular points that are close to a given object under certain metric. In this paper, we develop a novel data-driven ANN search algorithm where the data structure is learned by fast spectral technique based on s landmarks selected by approximate ridge leverage scores. We show that with overwhelming probability, our algorithm returns the (1+ϵ/4)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(1+\epsilon /4)$$\end{document}-ANN for any approximation parameter ϵ∈(0,1)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\epsilon \in (0,1)$$\end{document}. A remarkable feature of our algorithm is that it is computationally efficient. Specifically, learning k-length hash codes requires O((s3+ns2)logn)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$O((s^3+ns^2)\log n)$$\end{document} running time and O(d2)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$O(d^2)$$\end{document} extra space, and returning the (1+ϵ/4)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(1+\epsilon /4)$$\end{document}-ANN of the query needs O(klogn)\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$O(k\log n)$$\end{document} running time. The experimental results on computer vision and natural language understanding tasks demonstrate the significant advantage of our algorithm compared to state-of-the-art methods.

Fast LMNN Algorithm Through Random Sampling

Accelerating Exact Nearest Neighbor Search in High Dimensional Euclidean Space Via Block Vectors

A Fast Training Method for OC-SVM Based on the Random Sampling Lemma

Fast Point Cloud Sampling Network.

Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback

Adaptive Square-Root Transformed Unscented FastSLAM with KLD-resampling

Parameter Free Large Margin Nearest Neighbor For Distance Metric Learning

Effective and General Distance Computation for Approximate Nearest Neighbor Search

Efficiently Learning a Distance Metric for Large Margin Nearest Neighbor Classification.

LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence

Provably Convergent Subgraph-wise Sampling for Fast GNN Training

Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies

A Kaczmarz Method with Simple Random Sampling for Solving Large Linear Systems

Lsh-sampling Breaks the Computation Chicken-and-egg Loop in Adaptive Stochastic Gradient Estimation

Fast spectral analysis for approximate nearest neighbor search

Efficient Training of Deep Neural Operator Networks via Randomized Sampling

A Fast KNN Algorithm Based on Simulated Annealing

Sparse Sampling Kaczmarz-Motzkin Method with Linear Convergence

Fast Low-rank Metric Learning for Large-scale and High-dimensional Data

Efficient Estimation of k for the Nearest Neighbors Class of Methods

GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware