HM: A Column-Oriented MapReduce System on Hybrid Storage
Sai Wu,Gang Chen,Ke Chen,Feng Li,Lidan Shou
DOI: https://doi.org/10.1109/TKDE.2015.2453961
IF: 9.235
2015-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:The solid-state hybrid drive (SSHD) incorporates a small NAND flash memory into a hard drive, resulting in an integrated device with combined Hard Disk Drive (HDD ) and Solid State Disk (SSD) storage. By identifying the data highly associated with the performance and buffering them in the SSD part, SSHD can deliver a better performance than the standard hard drive. However, that requires a significant redesign for existing data processing systems. In this paper, we examine the problem of efficiently processing relational data using MapReduce on a cluster using SSHDs as the underlying storage devices. We present the design of Hybrid MapReduce (HM ), a column-oriented MapReduce system, which adopts different storage layout, query optimizer, data index, and compression algorithm from previous MapReduce systems. In HM, the Distributed File System (DFS ) is deployed on SSHDs, and data layout (how data chunks are disseminated to HDDs and SSDs) plays a key role for the performance. Hence, an approximate algorithm is used to tune the data layout adaptively to maximize the query performance. We evaluate HM using TPC-H benchmark and the results show that with our new design, the hybrid system can provide a similar performance as the SSD-only system.