A Hadoop-Based Performance Optimization of Network Stream Input Format

Xiao Ping Wang,Jiang Tao Luo,Wei Gao,Yong Liu
DOI: https://doi.org/10.4028/www.scientific.net/amm.644-650.2906
2014-01-01
Applied Mechanics and Materials
Abstract:Network stream analysis is one of the essential applications of industrial research in the era of big data. As the input format of the major massive data application platform--Hadoop, cannot support network stream sufficiently. This paper proposes a feasible optimization design. Firstly, the HDFS block-storage structure and the particular libpcap file format of network stream are considered. Then input files were pre-processed as large as HDFS block-size, and a new data input format called blockPcapInputFormat is achieved by expanding the fileInputFormat of Hadoop. Furthermore, experiments are performed for verifying the proposed design’ effectiveness. Results have shown that the optimization scheme is not only able to accelerate the processing performance of libpcap files effectively, but also suitable for applications where Hadoop parses network stream.
What problem does this paper attempt to address?