Abstract:OLTP (online transaction processing) database systems are essential building blocks for many popular web services (e.g. Amazon) [13]. Traditionally, OLTP systems are no different than other database management systems (DBMS), as they all use some popular and general relational DBMS, such as DB2 [9] and Shore [1]. In the past few years, however, a number of multicore main-memory OLTP databases, including Hekaton [6], VoltDB [4], HStore [10] and Silo [15], have emerged and have been gaining more and more attention due to their extraordinary performance. For example, Silo can achieve 700,000 transactions per second running TPCC benchmark on a 32-core machine [15], which is a 2order-of-magnitude improvement over disk-based relational DBMS. Silo-R, a recent extension of Silo, adds durability to the database and can also achieve 550,000 transactions per second under the same settings [16]. Pushes from 2 aspects contribute to this technical transition. First, OLTP market characteristics make main memory a desirable choice. OLTP market usually deals with business data processing [7], which requires high peak throughput. For example, China’s e-commerce giant Alibaba receives 278 million orders ($9.3 billion worth) during a 24-hour online shopping festival on November 11th, 2014 [14], with its 1st billion worth of orders placed in less than 20 minutes [7]. Such high throughput might be challenging for current diskbased databases to match. On the other hand, main memory has become sufficiently cheap (and faster) to host OLTP datasets. A modern commercial server typically has several terabytes of RAM [15], which is more than enough for most OLTP workloads. Moreover, the size of OLTP systems usually do not scale exponentially as RAM capacity does, because customer and real world entities do not obey Moore’s law [8]. Seven years ago, Harizopoulos et. al. presented a performance breakdown graph running Shore in memory [8]. In this work, we conducted a detailed performance study on Silo [15], a recent state-of-the-art mainmemory database. Compared to Shore, the elimination of buffer manager (centralized page manager) significantly improves table operation (record access) performance, and we found that index operation is now the new bottleneck in Silo. In terms of cache performance, misses mostly happen at last-level cache for table operations, indicating poor locality among table records. The RCU (read-copy-write) region (to support concurrent access) of the underlying B-tree indices also causes a significant number of last-level write misses. In addition, we examined the overhead of Silo’s OCC protocol. We found that under normal workloads, OCC still imposes a significant overhead during commit time, especially when transaction is short and write intensive. As the workloads become skewer or there are more threads, OCC overhead starts to dominate mainly because the data read during transaction execution is more likely to be modified by other threads, causing transaction abort. We also report our findings as we vary the transaction types. For example, we found that the abort rate peaks at the point where there are 50% read transactions and 50% write transactions. The remainder of this report is organized as follows. Section 2 gives an overview of our target system Silo. Section 3 introduces the measurement methodology as well as the benchmark we are using. We report our measurement result in section 4. Finally, we conclude our report in Section 5.

Exploiting Single-Threaded Model in Multi-Core In-Memory Systems.

Exploiting Single-Threaded Model in Multi-Core Systems

Towards a Non-2PC Transaction Management in Distributed Database Systems

<i>SA-LSM</i>: Optimize Data Layout for LSM-tree Based Storage using Survival Analysis

SA-LSM

Software-Based Lightweight Multithreading to Overlap Memory-Access Latencies of Commodity Processors

An Efficient Design and Implementation of LSM-tree Based Key-Value Store on Open-Channel SSD.

Optimizing LSM-based indexes for disaggregated memory

Performance Study on an In-Memory OLTP Database

CMP Thread Assignment Based on Group Sharing L2 Cache

LosaTM: A Hardware Transactional Memory Integrated with a Low-Overhead Scenario-Awareness Conflict Manager

Scheduling OLTP Transactions via Machine Learning

Efficient Execution of Multiple Queries on Deep Memory Hierarchy

Object-centric bank partition for reducing memory interference in CMP systems.

bCATE: a balanced contention-aware transaction execution model for highly concurrent OLTP systems

Breaking Down Memory Walls: Adaptive Memory Management in LSM-based Storage Systems (Extended Version)

Breaking the Synchronization Bottleneck with Reconfigurable Transactional Execution

A Study of Leveraging Memory Level Parallelism for DRAM System on Multi-core/Many-Core Architecture

A Performance Evaluation of DRAM Access for In-Memory Databases

Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access

Multi-Threading Performance on Commodity Multi-core Processors