Pseudo-Cache-Based IoT Small Files Management Framework in HDFS Cluster

Isma Farah Siddiqui,Nawab Muhammad Faseeh Qureshi,Bhawani Shankar Chowdhry,Muhammad Aslam Uqaili
DOI: https://doi.org/10.1007/s11277-020-07312-3
IF: 2.017
2020-05-02
Wireless Personal Communications
Abstract:Internet of Things (IoT) devices are generating an enormous number of files that are categorized into two types: (1) large files and (2) small files. Hadoop Distributed File System (HDFS) processes datasets using a default compression technique Hadoop Archives (HAR) for building data chunks of 64, 128 and 256 MBs. This technique works in normal batch processing, however, when a streaming chunk of IoT dataset is considered, it returns issues not addressed before: (1) improper file wrapping, (2) random access latency, (3) slower Namenode and (4) wastage of block volume. This paper proposes a novel technique pseudo-cache-based small files management framework (PSFMF), that bypasses default HAR with its novel logical file association mechanism and avoids huge memory to build HDFS blocks. The evaluation shows that PSFMF reduces the usage of memory consumption, increases MapReduce performance and reduces tasks workload over HDFS cluster.
telecommunications
What problem does this paper attempt to address?