(Nearest) Neighbors You Can Rely On: Formally Verified k-d Tree Construction and Search in Coq

Nadeem Abdul Hamid
2023-11-18
Abstract:The k-d tree is a classic binary space-partitioning tree used to organize points in k-dimensional space. While used in computational geometry and graphics, the data structure has a long history of application in nearest neighbor search. The objective of the nearest neighbor search problem is to efficiently find the closest point(s) to a given query point, and is the basis, in turn, of common machine learning techniques. We present in this paper a case study in the certified implementation, using the Coq proof assistant, of k-d tree construction from a set of data and the accompanying K-nearest neighbors search algorithm. Our experience demonstrates an intuitive method for specifying properties of these algorithms using the notion of list permutations.
Logic in Computer Science,Computational Geometry
What problem does this paper attempt to address?
The paper attempts to address the problem of ensuring the correctness and reliability of $k$-d tree construction and nearest neighbor search algorithms through formal verification methods. Specifically: - **Problem Background**: The Nearest Neighbor (NN) search problem involves finding the data point closest to a given query point in a $k$-dimensional space. This problem has widespread applications in various fields such as machine learning, classification, recommendation systems, computer vision, and image retrieval. To improve efficiency, the classic $k$-d tree algorithm is introduced to achieve efficient branch-and-bound search techniques. - **Research Objectives**: The authors use the Coq proof assistant to formally verify the construction of $k$-d trees and the $k$-nearest neighbor search algorithm. This includes: - Implementing and verifying a variant of the quickselect algorithm for finding the median during tree construction; - Implementing and verifying the construction of traditional $k$-d tree structures and their nearest neighbor search algorithms; - Implementing and verifying a general algorithm for $k$-nearest neighbor search using $k$-d trees and bounded priority queues. - **Motivation and Context**: As society increasingly relies on applications that depend on core algorithms (such as nearest neighbor search), the application of formal verification methods becomes particularly important. Safety-critical applications such as autonomous driving, medical imaging, and network security depend not only on low-level security features but also on high-level correctness and accuracy characteristics. Especially in machine learning systems, current implementations are primarily black-box operations, with performance and accuracy tested only empirically. Therefore, applying formal verification methods to such algorithms is of significant importance. In summary, this paper aims to ensure the reliability and correctness of $k$-d tree construction and $k$-nearest neighbor search algorithms through formal verification, thereby enhancing the safety and accuracy of application systems based on these algorithms.