A Generic Approach For Bulk Loading Trie-Based Index Structures On External Storage

Dongzhe Ma,Jianhua Feng
DOI: https://doi.org/10.1007/978-3-319-08010-9_8
2014-01-01
Abstract:A wide range of applications require efficient management of sorted data on external storage. Recently, trie-based data structures have attracted much attention from the academia as a competitive alternative for the ubiquitous B-tree. In this paper, we present a novel approach for bulk loading disk-based trie structures (a.k.a. B-trie). Our algorithm sorts raw data at first and then builds the B-trie directly from the sorted data. Data in the output data structure are compacted and physically ordered, and thus efficient sequential access can be obtained. We test the proposed algorithm with both real-world and synthetic datasets. Experimental results show that our algorithm outperforms the baseline insertion method dramatically when the dataset is large enough and is almost always superior to the basic sort-and-insert algorithm.
What problem does this paper attempt to address?