HTAP Databases: A Survey

Chao Zhang,Guoliang Li,Jintao Zhang,Xinning Zhang,Jianhua Feng

DOI: https://doi.org/10.1109/TKDE.2024.3389693

2024-04-24

Abstract:Since Gartner coined the term, Hybrid Transactional and Analytical Processing (HTAP), numerous HTAP databases have been proposed to combine transactions with analytics in order to enable real-time data analytics for various data-intensive applications. HTAP databases typically process the mixed workloads of transactions and analytical queries in a unified system by leveraging both a row store and a column store. As there are different storage architectures and processing techniques to satisfy various requirements of diverse applications, it is critical to summarize the pros and cons of these key techniques. This paper offers a comprehensive survey of HTAP databases. We mainly classify state-of-the-art HTAP databases according to four storage architectures: (a) Primary Row Store and In-Memory Column Store; (b) Distributed Row Store and Column Store Replica; (c) Primary Row Store and Distributed In-Memory Column Store; and (d) Primary Column Store and Delta Row Store. We then review the key techniques in HTAP databases, including hybrid workload processing, data organization, data synchronization, query optimization, and resource scheduling. We also discuss existing HTAP benchmarks. Finally, we provide the research challenges and opportunities for HTAP techniques.

Databases

What problem does this paper attempt to address?

This paper is a comprehensive survey of Hybrid Transactional/Analytical Processing (HTAP) databases. HTAP aims to combine Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) to achieve real-time data analysis and eliminate the need for ETL processes. With the growth of various data-intensive applications such as finance, e-commerce, and fraud detection, HTAP technology is becoming increasingly important. The paper mentions that HTAP databases typically adopt a dual storage architecture of row store and column store to accommodate different workload requirements. However, different HTAP databases employ different storage strategies and technologies based on application priorities, availability, scalability, performance, and data freshness requirements. The main challenges include handling mixed workloads, data organization, data synchronization, query optimization, and resource scheduling. The paper categorizes existing HTAP databases into four storage architectures and discusses key techniques such as handling mixed workloads, organizing data, when to synchronize transactional data, optimizing queries, and scheduling resources. Additionally, it mentions existing HTAP benchmarking and future research challenges and opportunities. Overall, the paper aims to address how to understand and evaluate various architectures and technologies of HTAP databases, as well as their advantages and disadvantages in addressing different application requirements. It also proposes future research directions to improve data processing efficiency and real-time analytical capabilities.

HTAP Databases: A Survey

HTAP Databases: What is New and What is Next

A survey on hybrid transactional and analytical processing

HyBench: A New Benchmark for HTAP Databases.

Near-data processing in database systems on native computational storage under HTAP workloads

NHtapDB: Native HTAP Databases

Practicability of Dataspace Systems

Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Cooperation

Polynesia: Enabling Effective Hybrid Transactional/Analytical Databases with Specialized Hardware/Software Co-Design

The AHA-Tree: An Adaptive Index for HTAP Workloads

Cloud-Native Databases: A Survey

A Comprehensive Overview of GPU Accelerated Databases

PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba.

G-Tran: A High Performance Distributed Graph Database with a Decentralized Architecture

Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAP

Towards a Non-2PC Transaction Management in Distributed Database Systems

In-Memory Big Data Management and Processing: A Survey

AQP++: Connecting Approximate Query Processing with Aggregate Precomputation for Interactive Analytics

Iterative Query Processing based on Unified Optimization Techniques

H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS

GTX: A Transactional Graph Data System For HTAP Workloads