Abstract:The demand of efficient data structures for query processing on massive data sets has grown tremendously in the past decades. Traditionally, data structures are designed and analyzed in the RAM model, where each memory cell can be accessed with unit cost. This assumption, however, is unrealistic for modeling modern memory hierarchies which consist of many levels of memories and caches with different sizes and access costs. As a consequence, a number of more elaborate models were introduced. Among them the most successful ones are the I/O model and the cache-oblivious model. In recent years, designing data structures that are I/O-efficient or cache-oblivious has become an active direction in both the theory and database communities.This thesis starts by considering the dictionary problem, one of the most basic data structure problems, in which we want to store and access a set of (key, data) pairs. Hash tables are the most efficient and the most fundamental data structure for implementing a dictionary. In this thesis we study hash tables in both the I/O model and the cache-oblivious model. We first show an inherent query-insertion tradeoff of hashing in the I/O model, which implies that the buffering technique is essentially useless for hash tables. In the cache-oblivious model, we build a hash table that achieves the same search cost as its cache-aware version does, for all block sizes.The second problem studied is another fundamental data structure problem, priority queues. The priority queue problem is well understood in the comparison based I/O model, where its complexity is known to be the same as sorting. In this thesis, we establish their equivalence in the I/O model without any restrictions, by providing a reduction from priority queues to sorting. Note that the other direction of the reduction is trivial.The third problem studied in this thesis is the summary query problem, which is a natural generalization of the range searching problem. Our goal is to design data structures that allow for extracting a statistical summary of all the records in the query range. The summaries we support include frequent items, quantiles, various sketches, and wavelets, all of which are of central importance in massive data analysis.

Cache-Oblivious Hashing

Dynamic External Hashing: The Limit of Buffering

Optimal Hashing in External Memory

Seeds of SEED: H 2 Cache: Building a Hybrid Randomized Cache Hierarchy for Mitigating Cache Side-Channel Attacks

Iceberg Hashing: Optimizing Many Hash-Table Criteria at Once

Rectangular Hash Table: Bloom Filter And Bitmap Assisted Hash Table With High Speed

Better GPU Hash Tables

GPU Lock-Free Hopscotch Hash Table

DHash: Enabling Dynamic and Efficient Hash Tables

DHash: Dynamic Hash Tables With Non-Blocking Regular Operations

Hash table in Chinese chess

One Slow Memory Access Hash Table ∗

DyCuckoo: Dynamic Hash Tables on GPUs.

Lock-free Dynamic Hash Tables with Open Addressing.

Deterministic and Efficient Hash Table Lookup Using Discriminated Vectors

Shifting Hash Table: An Efficient Hash Table With Delicate Summary

A Hash Table Without Hash Functions, and How to Get the Most Out of Your Random Bits

A Chunk-Based Hash Table Caching Method for In-Memory Hash Joins.

Fast Approximate Hash Table Using Extended Counting Bloom Filter

Classic and new data structure problems in external memory

Towards the design of efficient hash-based indexing scheme for growing databases on non-volatile memory