A Proposed Approach for Improving Hadoop Performance for Handling Small Files

Arnab Karan,Siddharth Swarup Rautaray,Manjusha Pandey
DOI: https://doi.org/10.1007/978-981-13-1498-8_28
2018-09-02
Abstract:As the world is getting digitized, the speed in which the amount of data is overflowing from different sources in different formats, and it is not possible for the traditional system to compute and analyze this kind of data called big data. To properly analyze and process big data, tool like Hadoop is used which is open source software. It stores and computes the data in a distributed environment. Big data is important as it plays a big part in making big benefits for today’s business It captures and analyzes the wealth of information of a company and quickly converts it into actionable insights. However, when it comes to storing and accessing of huge amount of small files, a bottleneck problem arises in the name node of Hadoop; so in this work, we propose a method to efficiently optimize the name node working by eradicating the bottleneck problem arising due to massive small files.
What problem does this paper attempt to address?