Data Management Techniques in Hadoop Framework for Handling Small Files: A Survey

Vijay Shankar Sharma,N. C. Barwar
DOI: https://doi.org/10.1007/978-981-15-4936-6_48
2020-09-17
Abstract:Hadoop is an open-source software framework, which offers cost-efficient solutions to store, manage and analyze a large amount of data; it provides distributed processing and storage of huge data across thousands of computers. Hadoop has two main components HDFS and Map Reduce. HDFS can easily handle large files but the performance of HDFS degrades when handling small size files. The usage of Name Node memory is not efficient as there are vast amount of small files to store without considering the correlation between them for data placement therefore overall performance of the Hadoop is not up to the mark. This paper provides a comparative study of various efficient data management techniques for handling small files in the Hadoop framework.
What problem does this paper attempt to address?