Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search

Georgii Novikov,Alexander Gneushev,Alexey Kadeishvili,Ivan Oseledets
2024-10-06
Abstract:Nearest-neighbor search in large vector databases is crucial for various machine learning applications. This paper introduces a novel method using tensor-train (TT) low-rank tensor decomposition to efficiently represent point clouds and enable fast approximate nearest-neighbor searches. We propose a probabilistic interpretation and utilize density estimation losses like Sliced Wasserstein to train TT decompositions, resulting in robust point cloud compression. We reveal an inherent hierarchical structure within TT point clouds, facilitating efficient approximate nearest-neighbor searches. In our paper, we provide detailed insights into the methodology and conduct comprehensive comparisons with existing methods. We demonstrate its effectiveness in various scenarios, including out-of-distribution (OOD) detection problems and approximate nearest-neighbor (ANN) search tasks.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of nearest - neighbor search in large - scale vector databases. Specifically, the author proposes using the tensor - train (TT) low - rank tensor decomposition method to efficiently represent point clouds and achieve fast approximate nearest - neighbor search (ANN). #### Main problems include: 1. **Efficiently compress point clouds**: - Large - scale point cloud data is very resource - consuming in terms of storage and computation. Existing compression methods (such as matrix approximation algorithms TT - SVD, TT - cross, etc.) do not work well when directly applied to point clouds. - The paper proposes to train TT decomposition through probability interpretation and density estimation loss (such as Sliced Wasserstein Loss) to achieve robust point cloud compression. 2. **Fast approximate nearest - neighbor search**: - Performing exact nearest - neighbor search in large - scale high - dimensional point clouds is very time - consuming. The paper utilizes the inherent hierarchical structure of TT - format point clouds to design an efficient ANN search algorithm. - This method can significantly reduce the search time and can greatly compress the data while maintaining high accuracy. 3. **Out - of - distribution detection**: - Out - of - distribution detection is an important task in many machine - learning applications. The paper shows how to apply TT point cloud compression to out - of - distribution detection, especially in distance - based OOD detection methods. - Compared with traditional subset sampling methods (such as Coreset), TT point cloud compression has obvious advantages in terms of memory usage and performance. #### Method overview: - **Tensorize point clouds**: Represent the point cloud as a matrix and then perform tensorization processing, and then use TT decomposition to convert it into a low - rank tensor form. - **Probability interpretation and loss function**: Introduce probability interpretation, and use Sliced Wasserstein Loss and Nearest - Neighbor Distance Loss to train TT parameters to ensure that the compressed point cloud can approximate the original distribution. - **Hierarchical structure and search algorithm**: Utilize the hierarchical structure characteristics of TT point clouds to develop a greedy search algorithm based on cluster centers to accelerate the ANN search process. Through these methods, the paper not only solves the problems of efficient compression and fast search of large - scale point clouds, but also shows its superior performance in practical applications such as out - of - distribution detection.