Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures

Ala-Eddine Benrazek,Zineddine Kouahla,Brahim Farou,Hamid Seridi,Ibtissem Kemouguette
2024-08-29
Abstract:The proliferation of interconnected devices in the Internet of Things (IoT) has led to an exponential increase in data, commonly known as Big IoT Data. Efficient retrieval of this heterogeneous data demands a robust indexing mechanism for effective organization. However, a significant challenge remains: the overlap in data space partitions during index construction. This overlap increases node access during search and retrieval, resulting in higher resource consumption, performance bottlenecks, and impedes system scalability. To address this issue, we propose three innovative heuristics designed to quantify and strategically reduce data space partition overlap. The volume-based method (VBM) offers a detailed assessment by calculating the intersection volume between partitions, providing deeper insights into spatial relationships. The distance-based method (DBM) enhances efficiency by using the distance between partition centers and radii to evaluate overlap, offering a streamlined yet accurate approach. Finally, the object-based method (OBM) provides a practical solution by counting objects across multiple partitions, delivering an intuitive understanding of data space dynamics. Experimental results demonstrate the effectiveness of these methods in reducing search time, underscoring their potential to improve data space partitioning and enhance overall system performance.
Databases,Artificial Intelligence,Information Retrieval,Performance
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the Internet of Things (IoT) big data environment, the problem of low search efficiency caused by the overlap of data space partitions in tree - based index structures. Specifically, the overlap of data space partitions will increase the number of node visits, resulting in higher resource consumption, performance bottlenecks, and hindering system scalability. To solve this problem, the author proposes three innovative heuristic methods to quantify and strategically reduce the overlap of data space partitions: 1. **Volume - Based Method (VBM)**: - Evaluate the overlap in detail by calculating the intersection volume between partitions. - The formulas are as follows: \[ V = \begin{cases} 0 & \text{if } d(p_1, p_2) \geq r_1 + r_2 \\ \min\{V_{P1}, V_{P2}\} & \text{if } d(p_1, p_2) \leq |r_1 - r_2| \\ \sum_{i = 1}^{2} V_\triangle(r_i, p_i) & \text{otherwise} \end{cases} \] \[ V_{\cap} = \begin{cases} 0 & \text{if } d(p_1, p_2) \geq r_1 + r_2 \\ 1 & \text{if } d(p_1, p_2) \leq |r_1 - r_2| \\ \frac{\sum_{i = 1}^{2} V_\triangle(r_i, p_i)}{V_{P1} + V_{P2}} & \text{otherwise} \end{cases} \] 2. **Distance - Based Method (DBM)**: - Evaluate the overlap by analyzing the distance between partition centers and their radii, providing a simplified but accurate method. - The formulas are as follows: \[ D_{\cap} = \begin{cases} 0 & \text{if } d(p_1, p_2) \geq r_1 + r_2 \\ 1 & \text{if } d(p_1, p_2) \leq |r_1 - r_2| \\ \frac{\sum_{i = 1}^{2} h_i}{d(p_1, p_2)} & \text{otherwise} \end{cases} \] 3. **Object - Based Method (OBM)**: - Intuitively understand the data space dynamics by counting the number of shared objects in multiple partitions, providing a practical solution. - The formulas are as follows: \[ A_{\cap} = \begin{cases} 0 & \text{if } d(p_1, p_2) \geq r_1 + r_2 \\ 1 & \text{if } d(p_1, p_2) \leq |r_1 - r_2| \\ \frac{|A|}{|P1| + |P2|} & \text{otherwise} \end{cases} \] These methods aim to optimize data space partitions, reduce search time, and improve overall system performance. The experimental results show that these methods have a significant effect on reducing search time and optimizing data space partitions.