Abstract:This paper introduces annotative indexing, a novel framework that unifies and generalizes traditional inverted indexes, column stores, object stores, and graph databases. As a result, annotative indexing can provide the underlying indexing framework for databases that support knowledge graphs, entity retrieval, semi-structured data, and ranked retrieval. While we primarily focus on human language data in the form of text, annotative indexing is sufficiently general to support a range of other datatypes, and we provide examples of SQL-like queries over a JSON store that includes numbers and dates. Taking advantage of the flexibility of annotative indexing, we also demonstrate a fully dynamic annotative index incorporating support for ACID properties of transactions with hundreds of multiple concurrent readers and writers.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of existing index structures (such as inverted index, column - store, object - store and graph database) when processing large - scale human language data. Specifically, these problems include: 1. **Limitations of a single index structure**: The traditional inverted index is mainly used for sparse retrieval in the first stage of information retrieval systems, focusing on fast retrieval and low latency. However, it cannot flexibly support multiple data types and complex query requirements. 2. **Processing of multi - format heterogeneous data**: Existing tools cannot flexibly store, convert and search unstructured and semi - structured human language data in multiple formats (such as JSON, CSV, HTML, etc.). 3. **Insufficient ability for dynamic update and supporting transactions**: Traditional index structures usually can only update in batches or rebuild the index, and cannot support fine - grained content annotation and efficient dynamic update, especially for scenarios requiring high - concurrent read - write. To solve these problems, the paper introduces a new framework, **annotative indexing**. Annotative indexing unifies and generalizes traditional inverted index, column - store, object - store and graph database, and can provide an underlying index framework for databases supporting knowledge graphs, entity retrieval, semi - structured data and ranked retrieval. In addition, annotative indexing also has the following characteristics: - **Flexibility**: It can handle multiple data types, not limited to text data. - **Dynamic update**: It supports ACID transaction properties, allowing multiple concurrent readers and writers to update efficiently. - **Efficient query processing**: It optimizes query performance through minimal - interval semantics. Overall, this paper aims to provide a more general, more flexible and efficient index solution through the annotative indexing framework to meet the complex processing requirements of large - scale human language data.

Annotative Indexing

anndata: Annotated data

Augmented Keyword Search on Spatial Entity Databases

ATLAS: A flexible and extensible architecture for linguistic annotation

Aperture synthesis for gravitational-wave data analysis: Deterministic Sources

Adaptive Hybrid Indexes

Efficient Immediate-Access Dynamic Indexing

Analysis of Indexing Structures for Immutable Data

An Indexing Framework for Efficient Retrieval on the Cloud.

There is No Such Thing as an "Index"! or: The next 500 Indexing Papers

An Efficient Approach for Building Compressed Full-Text Index for Structured Data

The ALVIS Format for Linguistically Annotated Documents

Graph Database Indexing Using Structured Graph Decomposition.

Annotation Based Query Answer over Inconsistent Database

Observations on Annotations

Towards Systematic Index Dynamization

Using of Neuro-Indexes

About a structure of easily updatable full-text indexes

Bf-Matrix: A Secondary Index For The Cloud Storage

Semantic Indexes for Machine Learning-based Queries over Unstructured Data

Adaptive indexing structure on XML data stored in RDBMS