Annotative Indexing

Charles L. A. Clarke
2024-11-10
Abstract:This paper introduces annotative indexing, a novel framework that unifies and generalizes traditional inverted indexes, column stores, object stores, and graph databases. As a result, annotative indexing can provide the underlying indexing framework for databases that support knowledge graphs, entity retrieval, semi-structured data, and ranked retrieval. While we primarily focus on human language data in the form of text, annotative indexing is sufficiently general to support a range of other datatypes, and we provide examples of SQL-like queries over a JSON store that includes numbers and dates. Taking advantage of the flexibility of annotative indexing, we also demonstrate a fully dynamic annotative index incorporating support for ACID properties of transactions with hundreds of multiple concurrent readers and writers.
Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of existing index structures (such as inverted index, column - store, object - store and graph database) when processing large - scale human language data. Specifically, these problems include: 1. **Limitations of a single index structure**: The traditional inverted index is mainly used for sparse retrieval in the first stage of information retrieval systems, focusing on fast retrieval and low latency. However, it cannot flexibly support multiple data types and complex query requirements. 2. **Processing of multi - format heterogeneous data**: Existing tools cannot flexibly store, convert and search unstructured and semi - structured human language data in multiple formats (such as JSON, CSV, HTML, etc.). 3. **Insufficient ability for dynamic update and supporting transactions**: Traditional index structures usually can only update in batches or rebuild the index, and cannot support fine - grained content annotation and efficient dynamic update, especially for scenarios requiring high - concurrent read - write. To solve these problems, the paper introduces a new framework, **annotative indexing**. Annotative indexing unifies and generalizes traditional inverted index, column - store, object - store and graph database, and can provide an underlying index framework for databases supporting knowledge graphs, entity retrieval, semi - structured data and ranked retrieval. In addition, annotative indexing also has the following characteristics: - **Flexibility**: It can handle multiple data types, not limited to text data. - **Dynamic update**: It supports ACID transaction properties, allowing multiple concurrent readers and writers to update efficiently. - **Efficient query processing**: It optimizes query performance through minimal - interval semantics. Overall, this paper aims to provide a more general, more flexible and efficient index solution through the annotative indexing framework to meet the complex processing requirements of large - scale human language data.