A Modeling Language for MapReduce Programing in a Storage System Perspective.

Yuxin Jing,Hanpin Wang,Yu Huang,Lei Zhang,Yongzhi Cao
DOI: https://doi.org/10.1007/s11265-017-1298-7
2017-01-01
Journal of Signal Processing Systems
Abstract:MapReduce is a powerful distributed data analysis programming model. It runs on big data storage systems and processes data in a parallel way. An appropriate way to ensure the correctness of MapReduce programs is formal method analysis, which requires firstly a formal model of MapReduce. In this paper we propose a modeling language to establish the formal model of the MapReduce framework. Unlike other approaches, our language describes the processing of data in the MapReduce programs from a perspective of underlying files and blocks, so that the details of data processing can be clearly demonstrated. The language is based on our previous work, a language describing the management of massive data storage systems, with extensions from two aspects: block content data refinement and concurrency support. Based on our language, the features of the MapReduce programming model can be discussed.
What problem does this paper attempt to address?