Determine the Hardware Choice to Improve HDFS Performance Deployed in a Commodity Cluster

Youwei Wang,Ge Fu,Weiping Wang,Xinran Liu,Can Ma,Dan Meng
DOI: https://doi.org/10.1109/cse.2013.192
2013-01-01
Abstract:The importance of storing and processing data eficiently is intensively highlighted in modern information technology infrastructures. Hadoop Distributed File System (HDFS) acts as the primary storage in modern cloud service environments and has been widely adopted for its portability and fault-tolerance. Current deployment of HDFS which runs on top of commodity hardware is unable to deliver desirable performance in terms of both latency and throughput. For data-intensive applications, I / O pressure becomes more exacerbated as the amount of data being stored and replicated to HDFS increases. In order to process extremely huge volume of data, investing in high-end hardware is one available practice. The primary contribution of this paper is to determine the I/O bottleneck for HDFS using both hardware and software approach and hence suggest corresponding solutions. Benchmarks and productivity tools are used to evaluate the proposed measure of improvement. The final conclusion about the crucial factor of the HDFS I/O performance is drawn based on experimental results.
What problem does this paper attempt to address?