Characterizing TPC-H on a Clustered Database Engine from the OS Perspective

Yanyong Zhang,Jianyong Zhang,Anand Sivasubramaniam,Chun Liu,Hubertus Franke
2002-01-01
Abstract:A range of database services are being offered on clusters of workstations today to meet the demanding needs of ap- plications with voluminous datasets, high computational and I/O requirements and a large number of users. The underly- ing database engine runs on cost-effective off-the-shelf hard- ware and software components that may not really be tai- lored/tuned for these applications. At the same time, many of these databases have legacy codes that may not be easy to modulate based on the evolving capabilities and limitations of clusters. An indepth understanding of the interaction be- tween these database engines and the underlying operating system (OS) can identify a set of characteristics that would be extremely valuable for future research on systems support for these environments. To our knowledge, there is no prior work that has embarked on such a characterization for a clus- tered database server. Using a public domain version of a commercial clustered database server and TPC-H like1 decision support queries, this paper studies numerous issues by evaluating perfor- mance on an off-the-shelf Pentium/Linux cluster connected by Myrinet. The execution profile clearly demonstrates the dominance of the I/O subsystem in the execution, and the im- portance of the communication subsystem for cluster scala- bility. In addition to quantifying their importance, this paper provides further details on how these subsystems are exer- cised by the database engine in terms of characteristics such as request sizes, spatial and temporal distributions. These characteristics provide insight on the benefits of possible op- timizations in these subsystems. This includes the potential savings by avoiding copies across protection domains during I/O and the potential reduction in the number of messages by employing multicasts. Mechanisms for performing such op- timizations are also discussed.
What problem does this paper attempt to address?