Abstract:In recent years, big data has been one of the hottest development directions in the information field. With the development of artificial intelligence technology, mobile smart terminals and high-bandwidth wireless Internet, various types of data are increasing exponentially. Huge amounts of data contain a lot of potential value, therefore how to effectively store and process data efficiently becomes very important. Hadoop Distributed File System (HDFS) has emerged as a typical representative of data-intensive distributed big data file systems, and it has features such as high fault tolerance, high throughput, and can be deployed on low-cost hardwares. HDFS nodes communicate with each other to make the big data systems work properly, using the Remote Procedure Call (RPC) mechanism. However, the RPC in HDFS is still not good enough to work better in terms of network throughput and abnormal response. This paper presents an optimization method to improve the performance of HDFS. The proposed method dynamically adjusts the RPC configurations between NameNode and Datallodes by sensing the data characters that stored in Datallodes. This method can effectively reduce the NameNode processing pressure, and improve the network throughput generated by the information transmission between NameNode and Datallodes. It can also reduce the abnormal response time of the whole system. Finally, the extensive experiments show the effectiveness and efficiency of our proposed method.

Zput: A speedy data uploading approach for the Hadoop Distributed File System

An Approach of Fast Data Manipulation in HDFS with Supplementary Mechanisms

RCFile: A Fast and Space-Efficient Data Placement Structure in MapReduce-based Warehouse Systems

DataMPI: Extending MPI to Hadoop-Like Big Data Computing

QDFS: A Quality-Aware Distributed File Storage Service Based on HDFS

QoSC: A QoS-Aware Storage Cloud Based on HDFS

A distributed storage method of remote sensing data based on image blocks organization

Improving Downloading Performance in Hadoop Distributed File System

Cloud Storage of Massive Remote Sensing Data Based on Distributed File System

A Novel Approach for Improving Security and Storage Efficiency on HDFS

Cumulus: A Distributed File System Based on Network Coding

Optimizing Hadoop Block Placement Policy and Cluster Blocks Distribution

ALow Cost Algorithm for FastIntelligent Content Delivery Based on Hadoop

A Data-Aware Remote Procedure Call Method for Big Data Systems

Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture

Fast Off-Site Backup and Recovery System for HDFS

Accelerate Data Sharing In A Wide-Area Networked File Storage System

Storage-Optimization Method for Massive Small Files of Agricultural Resources Based on Hadoop

Uncoupled MapReduce: A Balanced and Efficient Data Transfer Model

Clover: A Distributed File System of Expandable Metadata Service Derived from HDFS

Hybrid storage architecture and efficient MapReduce processing for unstructured data