An Efficient Approach for Building Compressed Full-Text Index for Structured Data

Jun Liang,Lin Xiao,Di Zhang
DOI: https://doi.org/10.1109/iccit.2009.42
2009-01-01
Abstract:The self-index is a kind of highly compressed, self-contained full-text index. It is designed for indexing plain texts in order to reduce its permanent storage, as well as to enhance searching performance. Apart from being a sequence of characters, usually the text has specific internal structure. The data record, as a basic model of structured data, is therefore employed to represent and organize such form of data widespread. In this paper, we design and implement an approach to building the self-index for data records via text medium. Our approach indexes the data records through an intermediate text which accommodates aligned record fields by stuffing delimiters among them. By theoretical analysis, we give the upper bounds of permanent space of our approach in a worst case. In addition, we report a series of experimental results to validate the correctness and efficiency of the proposed approach.
What problem does this paper attempt to address?